John DeNero
· ProfessorVerifiedUniversity of California, Berkeley · Department of Statistics
Active 2005–2026
Research topics
- Computer Science
- Artificial Intelligence
- Machine Learning
- Information Retrieval
- Data Mining
- Data science
- Medicine
- Mathematics
- Statistics
- World Wide Web
Selected publications
Instructors' Perspectives on LLM-Generated Programming Formative Feedback
2026-02-13
articleOpen accessWe study instructor perspectives on LLM-generated programming feedback in an introductory Python course. LLM tutors predominantly offered debugging help, while human instructors preferred more diverse feedback types, including conceptual reminders, revisiting the problem, and examples. Cases where LLM tutor feedback diverged from human instructors' intent required major edits with different feedback types, while cases with closer alignment needed only minor changes with similar feedback types. Findings highlight the need for LLM tutors to reflect on instructor intent to ensure pedagogically aligned feedback.
The Missing Evaluation Axis: What 10,000 Student Submissions Reveal About AI Tutor Effectiveness
arXiv (Cornell University) · 2026-05-07
preprintOpen accessCurrent Artificial Intelligence (AI)-based tutoring systems (AI tutors) are primarily evaluated based on the pedagogical quality of their feedback messages. While important, pedagogy alone is insufficient because it ignores a critical question: what do students actually do with the feedback they receive? We argue that AI tutor evaluation should be extended with a behavioral dimension grounded in student interaction data, which complements pedagogical assessment. We propose an evaluation framework and apply it to 10,235 code submissions with corresponding AI tutor feedback from an introductory undergraduate programming course to measure whether students act on tutor feedback and whether those actions are applied correctly. Using this framework to compare two deployed AI tutors across different semesters in a large-scale introductory computer science course reveals substantial differences in student engagement patterns that are not captured by pedagogy-only evaluation. Moreover, these engagement-based behavioral signals are more strongly associated with student perception of helpful feedback than pedagogical quality alone, providing a more complete and actionable picture of AI tutor performance.
Misconception-Aware LLM Programming Tutor: Lessons Learned from Student-Tutor Interactions
2026-02-13
articleOpen accessLarge Language Models (LLMs) are increasingly used as programming tutors, but their feedback is often generic and prone to solution leakage. To address these issues, we present MisconceptionTutor, which grounds feedback in common student misconceptions. Through both pre-deployment analyses and a real-classroom deployment, we find that even simple prompting frameworks can meaningfully steer tutor behavior to be more pedagogically oriented and noticeably more satisfying to students.
The Missing Evaluation Axis: What 10,000 Student Submissions Reveal About AI Tutor Effectiveness
ArXiv.org · 2026-05-07
articleOpen accessCurrent Artificial Intelligence (AI)-based tutoring systems (AI tutors) are primarily evaluated based on the pedagogical quality of their feedback messages. While important, pedagogy alone is insufficient because it ignores a critical question: what do students actually do with the feedback they receive? We argue that AI tutor evaluation should be extended with a behavioral dimension grounded in student interaction data, which complements pedagogical assessment. We propose an evaluation framework and apply it to 10,235 code submissions with corresponding AI tutor feedback from an introductory undergraduate programming course to measure whether students act on tutor feedback and whether those actions are applied correctly. Using this framework to compare two deployed AI tutors across different semesters in a large-scale introductory computer science course reveals substantial differences in student engagement patterns that are not captured by pedagogy-only evaluation. Moreover, these engagement-based behavioral signals are more strongly associated with student perception of helpful feedback than pedagogical quality alone, providing a more complete and actionable picture of AI tutor performance.
2025-01-01 · 1 citations
articleOpen accessSenior authorPensieve Discuss: Scalable Small-Group CS Tutoring System with AI
2025-02-18 · 3 citations
articleSenior authorFrom Code to Concepts: Textbook-Driven Knowledge Tracing with LLMs in CS1
2025-02-18 · 4 citations
articleOpen accessGauging a student's understanding of course concepts, at an arbitrary point during a course, can be challenging. Standardized exams offer only a snapshot of performance rather than a deep understanding of progress. However, with Large Language Models (LLMs) now deployed at scale in CS1 courses, we can track multiple attempts from each student for every homework problem. This data provides insights into how students learn and deploy concepts over time, presenting a unique opportunity to rethink how we track changes in individual student knowledge. Traditional Knowledge Tracing (KT) methods often lack explainability and are computationally expensive. In contrast, our framework leverages an LLM to identify student progress on labeled, problem-level concepts from a student homework code submission. Our initial results show that the student's knowledge state can be dynamically updated. This knowledge state can then be used to provide more targeted, effective feedback and create tailored study materials.
Spotting AI Missteps: Students Take on LLM Errors in CS1
2025-02-18 · 4 citations
article2025-02-12 · 22 citations
articleOpen accessLLM-based chatbots enable students to get immediate, interactive help on homework assignments, but even a thoughtfully-designed bot may not serve all pedagogical goals. We report here on the development and deployment of a GPT-4-based interactive homework assistant ("61A Bot'') for students in a large CS1 course; over 2000 students made over 100,000 requests of our Bot across two semesters. Our assistant offers one-shot, contextual feedback within the command-line "autograder'' students use to test their code. Our Bot wraps student code in a custom prompt that supports our pedagogical goals and avoids providing solutions directly. Analyzing student feedback, questions, and autograder data, we find reductions in homework-related question rates in our course forum, as well as reductions in homework completion time when our Bot is available. For students in the 50th -80th percentile, reductions can exceed 30 minutes per assignment, up to 50% less time than students at the same percentile rank in prior semesters. Finally, we discuss these observations, potential impacts on student learning, and other potential costs and benefits of AI assistance in CS1.
A Knowledge-Component-Based Methodology for Evaluating AI Assistants
arXiv (Cornell University) · 2024-06-09
preprintOpen accessWe evaluate an automatic hint generator for CS1 programming assignments powered by GPT-4, a large language model. This system provides natural language guidance about how students can improve their incorrect solutions to short programming exercises. A hint can be requested each time a student fails a test case. Our evaluation addresses three Research Questions: RQ1: Do the hints help students improve their code? RQ2: How effectively do the hints capture problems in student code? RQ3: Are the issues that students resolve the same as the issues addressed in the hints? To address these research questions quantitatively, we identified a set of fine-grained knowledge components and determined which ones apply to each exercise, incorrect solution, and generated hint. Comparing data from two large CS1 offerings, we found that access to the hints helps students to address problems with their code more quickly, that hints are able to consistently capture the most pressing errors in students' code, and that hints that address a few issues at once rather than a single bug are more likely to lead to direct student progress.
Frequent coauthors
- 20 shared
Dan Klein
- 18 shared
Joern Wuebker
- 9 shared
Anobel Y. Odisho
University of California, San Francisco
- 8 shared
Briton Park
- 7 shared
Nicholas Altieri
Flatiron Health (United States)
- 7 shared
Bin Yu
Beihang University
- 6 shared
Brian Hou
University of Washington
- 6 shared
Thomas Zenkel
- Resume-aware match score
- Save to shortlist
- AI-drafted outreach
See your match with John DeNero
PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.
- Free to start
- No credit card
- 30-second signup