John DeNero

· ProfessorVerified

University of California, Berkeley · Department of Statistics

Active 2005–2026

h-index24

Citations1.9k

Papers8228 last 5y

Funding—

Faculty page

See your match with John DeNero — sign in to PhdFit.Sign in

Research topics

Computer Science
Artificial Intelligence
Machine Learning
Information Retrieval
Data Mining
Data science
Medicine
Mathematics
Statistics
World Wide Web

Selected publications

Instructors' Perspectives on LLM-Generated Programming Formative Feedback
2026-02-13
articleOpen access
We study instructor perspectives on LLM-generated programming feedback in an introductory Python course. LLM tutors predominantly offered debugging help, while human instructors preferred more diverse feedback types, including conceptual reminders, revisiting the problem, and examples. Cases where LLM tutor feedback diverged from human instructors' intent required major edits with different feedback types, while cases with closer alignment needed only minor changes with similar feedback types. Findings highlight the need for LLM tutors to reflect on instructor intent to ensure pedagogically aligned feedback.
Publisher DOI
The Missing Evaluation Axis: What 10,000 Student Submissions Reveal About AI Tutor Effectiveness
arXiv (Cornell University) · 2026-05-07
preprintOpen access
Current Artificial Intelligence (AI)-based tutoring systems (AI tutors) are primarily evaluated based on the pedagogical quality of their feedback messages. While important, pedagogy alone is insufficient because it ignores a critical question: what do students actually do with the feedback they receive? We argue that AI tutor evaluation should be extended with a behavioral dimension grounded in student interaction data, which complements pedagogical assessment. We propose an evaluation framework and apply it to 10,235 code submissions with corresponding AI tutor feedback from an introductory undergraduate programming course to measure whether students act on tutor feedback and whether those actions are applied correctly. Using this framework to compare two deployed AI tutors across different semesters in a large-scale introductory computer science course reveals substantial differences in student engagement patterns that are not captured by pedagogy-only evaluation. Moreover, these engagement-based behavioral signals are more strongly associated with student perception of helpful feedback than pedagogical quality alone, providing a more complete and actionable picture of AI tutor performance.
Publisher DOI
Misconception-Aware LLM Programming Tutor: Lessons Learned from Student-Tutor Interactions
2026-02-13
articleOpen access
Large Language Models (LLMs) are increasingly used as programming tutors, but their feedback is often generic and prone to solution leakage. To address these issues, we present MisconceptionTutor, which grounds feedback in common student misconceptions. Through both pre-deployment analyses and a real-classroom deployment, we find that even simple prompting frameworks can meaningfully steer tutor behavior to be more pedagogically oriented and noticeably more satisfying to students.
Publisher DOI
The Missing Evaluation Axis: What 10,000 Student Submissions Reveal About AI Tutor Effectiveness
ArXiv.org · 2026-05-07
articleOpen access
Current Artificial Intelligence (AI)-based tutoring systems (AI tutors) are primarily evaluated based on the pedagogical quality of their feedback messages. While important, pedagogy alone is insufficient because it ignores a critical question: what do students actually do with the feedback they receive? We argue that AI tutor evaluation should be extended with a behavioral dimension grounded in student interaction data, which complements pedagogical assessment. We propose an evaluation framework and apply it to 10,235 code submissions with corresponding AI tutor feedback from an introductory undergraduate programming course to measure whether students act on tutor feedback and whether those actions are applied correctly. Using this framework to compare two deployed AI tutors across different semesters in a large-scale introductory computer science course reveals substantial differences in student engagement patterns that are not captured by pedagogy-only evaluation. Moreover, these engagement-based behavioral signals are more strongly associated with student perception of helpful feedback than pedagogical quality alone, providing a more complete and actionable picture of AI tutor performance.
Publisher OA PDF
Cross-lingual Human-Preference Alignment for Neural Machine Translation with Direct Quality Optimization
2025-01-01 · 1 citations
articleOpen accessSenior author
Publisher OA PDF DOI
Pensieve Discuss: Scalable Small-Group CS Tutoring System with AI
2025-02-18 · 3 citations
articleSenior author
Publisher DOI
From Code to Concepts: Textbook-Driven Knowledge Tracing with LLMs in CS1
2025-02-18 · 4 citations
articleOpen access
Gauging a student's understanding of course concepts, at an arbitrary point during a course, can be challenging. Standardized exams offer only a snapshot of performance rather than a deep understanding of progress. However, with Large Language Models (LLMs) now deployed at scale in CS1 courses, we can track multiple attempts from each student for every homework problem. This data provides insights into how students learn and deploy concepts over time, presenting a unique opportunity to rethink how we track changes in individual student knowledge. Traditional Knowledge Tracing (KT) methods often lack explainability and are computationally expensive. In contrast, our framework leverages an LLM to identify student progress on labeled, problem-level concepts from a student homework code submission. Our initial results show that the student's knowledge state can be dynamically updated. This knowledge state can then be used to provide more targeted, effective feedback and create tailored study materials.
Publisher OA PDF DOI
Spotting AI Missteps: Students Take on LLM Errors in CS1
2025-02-18 · 4 citations
article
Publisher DOI
61A Bot Report: AI Assistants in CS1 Save Students Homework Time and Reduce Demands on Staff. (Now What?)
2025-02-12 · 22 citations
articleOpen access
LLM-based chatbots enable students to get immediate, interactive help on homework assignments, but even a thoughtfully-designed bot may not serve all pedagogical goals. We report here on the development and deployment of a GPT-4-based interactive homework assistant ("61A Bot'') for students in a large CS1 course; over 2000 students made over 100,000 requests of our Bot across two semesters. Our assistant offers one-shot, contextual feedback within the command-line "autograder'' students use to test their code. Our Bot wraps student code in a custom prompt that supports our pedagogical goals and avoids providing solutions directly. Analyzing student feedback, questions, and autograder data, we find reductions in homework-related question rates in our course forum, as well as reductions in homework completion time when our Bot is available. For students in the 50th -80th percentile, reductions can exceed 30 minutes per assignment, up to 50% less time than students at the same percentile rank in prior semesters. Finally, we discuss these observations, potential impacts on student learning, and other potential costs and benefits of AI assistance in CS1.
Publisher DOI
A Knowledge-Component-Based Methodology for Evaluating AI Assistants
arXiv (Cornell University) · 2024-06-09
preprintOpen access
We evaluate an automatic hint generator for CS1 programming assignments powered by GPT-4, a large language model. This system provides natural language guidance about how students can improve their incorrect solutions to short programming exercises. A hint can be requested each time a student fails a test case. Our evaluation addresses three Research Questions: RQ1: Do the hints help students improve their code? RQ2: How effectively do the hints capture problems in student code? RQ3: Are the issues that students resolve the same as the issues addressed in the hints? To address these research questions quantitatively, we identified a set of fine-grained knowledge components and determined which ones apply to each exercise, incorrect solution, and generated hint. Comparing data from two large CS1 offerings, we found that access to the hints helps students to address problems with their code more quickly, that hints are able to consistently capture the most pressing errors in students' code, and that hints that address a few issues at once rather than a single bug are more likely to lead to direct student progress.
Publisher OA PDF DOI

Frequent coauthors

Dan Klein
20 shared
Joern Wuebker
18 shared
Anobel Y. Odisho
University of California, San Francisco
9 shared
Briton Park
8 shared
Nicholas Altieri
Flatiron Health (United States)
7 shared
Bin Yu
Beihang University
7 shared
Brian Hou
University of Washington
6 shared
Thomas Zenkel
6 shared

Resume-aware match score
Save to shortlist
AI-drafted outreach

See your match with John DeNero

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

Join the waitlist How it works

Free to start
No credit card
30-second signup

Find professors who actually fit you

John DeNero

Research topics

Selected publications

Frequent coauthors

See your match with John DeNero