
Anthony Botelho
· Assistant ProfessorVerifiedUniversity of Florida · Education
Active 2015–2026
About
Anthony Botelho is a faculty member at the University of Florida, where he is involved in the College of Education. His research focuses on the history and social foundations of education, with particular interest in community-based education, youth political education, and sustainable movement-building. His scholarship draws on archival, oral history, and community-engaged methods to document and analyze the history of child and youth work, notably at the Highlander Research and Education Center, emphasizing how the care and political education of young people support social movement sustainability. Botelho's academic background includes a Ph.D. in Educational Theory and Practice with a Critical Studies emphasis from the University of Georgia, a M.Ed. in Educational Administration and Policy from the same institution, and a B.A. in Political Science from the University of Florida. His work has contributed to understanding the intersections of education, social justice, and community activism, and he has been involved in various grants and scholarly projects related to civic engagement, social justice, and the history of education.
Research topics
- Computer Science
- Psychology
- Mathematics education
- Artificial Intelligence
- Engineering
- Multimedia
- Mathematics
- World Wide Web
- Pedagogy
Selected publications
Evaluating the causal effects of digital use quality on students’ ICT literacy
Journal of Research on Technology in Education · 2026-01-08 · 1 citations
articleHow to Assess AI Literacy: Misalignment Between Self-Reported and Objective-Based Measures
2026-04-25
article2026-02-27
articleSenior authorCreativity assessment at scale is difficult because expert ratings are resource-intensive and hard to use in dynamic settings. Large language models (LLMs) offer potential for automated assessment, yet their validity for evaluating multimodal creative artifacts in authentic educational contexts remains unexplored. Here we evaluated whether LLMs can assess human creativity in Physics Playground, an educational physics game where students design playable levels. We compared rubric-guided and rubric-free prompting approaches across 421 student-created levels, tested three multimodal input configurations, and examined reliability and model capacity effects using GPT and Gemini model families. Rubric-guided prompting yielded strong agreement with human expert ratings compared to rubric-free approaches (rs ranged from .61 to .81). Multimodal inputs combining images with structured data significantly enhanced validity compared to text-only methods. These effects were consistent across GPT-4o and Gemini 2.5 Flash. Also, single model calls achieved comparable reliability to averaged responses. Model capacity substantially influenced performance, with larger, high-capacity models (e.g., GPT-4o) consistently outperforming smaller, low-capacity variants (e.g., GPT-4o-mini). Theoretically, these findings extend creativity assessment to multimodal artifacts in authentic contexts. Practically, embedding assessment in learning games enables them to foster creativity and support STEM learning and AI literacy.
OSF Preprints (OSF Preprints) · 2026-04-12
other1st authorCorrespondingDisentangling Learning from Judgment: Representation Learning for Open Response Analytics
2026-04-25
articleOpen accessSenior authorOpen-ended responses are central to learning, yet automated scoring often conflates what students wrote with how teachers grade. We present an analytics-first framework that separates content signals from rater tendencies, making judgments visible and auditable via analytics. Using de-identified ASSISTments mathematics responses, we model teacher histories as dynamic priors and represent text with sentence embeddings. We apply centroid normalization and response–problem embedding differences, and explicitly model teacher effects with priors to reduce problem- and teacher-related confounds. Temporally-validated linear models quantify the contributions of each signal, and model disagreements surface observations for qualitative inspection. Results show that teacher priors heavily influence grade predictions; the strongest results arise when priors are combined with content embeddings (AUC ≈ 0.815), while content-only models remain above chance but substantially weaker (AUC ≈ 0.626). Adjusting for rater effects sharpens the selection of features derived from content representations, retaining more informative embedding dimensions and revealing cases where semantic evidence supports understanding as opposed to surface-level differences in how students respond. The contribution presents a practical pipeline that transforms embeddings from mere features into learning analytics for reflection, enabling teachers and researchers to examine where grading practices align (or conflict) with evidence of student reasoning and learning.
Examining Students' Code Comprehension with LLMs in Block- and Text-Based Programming
2026-02-13
articleOpen accessSenior authorUnderstanding how students reason about code is essential for providing tailored scaffolding in computer science (CS) education. Prior work has used think-aloud protocols with the Structure of the Observed Learning Outcomes (SOLO) taxonomy to examine students' code comprehension and programming levels. However, analyzing such data is labor-intensive and requires expert judgment. Recent advances in large language models (LLMs) offer a promising avenue for scaling this analysis, though their reliability for fine-grained coding remains uncertain. To address this gap, our study investigates the extent to which GPT-5 and 4o can classify SOLO levels and identify code-comprehension strategies from think-aloud transcripts of 27 high-school students working on block-based and text-based tasks. Results show modest alignment with human ratings for SOLO, with one-shot prompting improving agreement over zero-shot, though distinctions between adjacent lower levels (e.g., Prestructural 1 vs. 2) remained difficult. Strategy detection demonstrated stronger performance, achieving accuracies of 75–77% (block) and 62–67% (text), particularly for surface-visible strategies such as 'walkthroughs', 'control-structure identification', and 'pattern recognition', but weaker for less frequent, abstract, meta-cognitive strategies such as 'strategizing' (planning an approach) or 'thoroughness' (systematically checking work). These findings highlight both the potential and the limitations of using GPT-5 and 4o to analyze think-aloud data. While this work represents an initial step, with plans to examine more models, our preliminary results indicate that a human-in-the-loop approach is essential to ensure reliability and interpretive depth. Future work will extend this evaluation to other LLMs to better understand their role in supporting instructional decision-making.
Using LLMs to Identify Indicators of Persistence from Students' Dialogues with a Pedagogical Agent
Zenodo (CERN European Organization for Nuclear Research) · 2026-03-03
articleOpen accessSenior authorConversational learning systems offer new opportunities to examine learning processes through chat log data. Constructs such as persistence, self-efficacy, interest, perceived challenge, and prior knowledge are known predictors of student performance but are challenging to detect at scale using traditional methods. This study explores the use of Large Language Models (LLMs) to automatically code indicators of these constructs from student chat logs collected through a conversation-based assessment (CBA) for middle school mathematics. Indicators included observable behaviors such as students' expressions of challenge, help-seeking, goal-setting, and self-regulatory strategies evident in their conversational interactions within the CBA. We evaluated multiple configurations of ChatGPT4o, varying temperature settings (0, .3, .7, 1) and model types (mini vs. regular), against human expert coders. The dataset comprised over 10,000 student turns collected from 107 middle school students classified as English learners as they interact with the CBA. Reliability was assessed within and between LLM configurations and humans. Results reveal systematic patterns: constructs with moderate theoretical coherence benefited from higher temperatures, while well-defined constructs required deterministic settings. Self-efficacy showed the highest human-LLM alignment. These findings illustrate the challenges of measuring complex psychological constructs and highlight the promise of human-LLM collaboration to enhance qualitative coding efficiency and validity in educational research. Supplemental materials are available online here: https://doi.org/10.17605/osf.io/s85ck.
Let Me Try Again: Examining Replay Behavior by Tracing Students' Latent Problem-Solving Pathways
ArXiv.org · 2026-01-03
articleOpen accessSenior authorPrior research has shown that students' problem-solving pathways in game-based learning environments reflect their conceptual understanding, procedural knowledge, and flexibility. Replay behaviors, in particular, may indicate productive struggle or broader exploration, which in turn foster deeper learning. However, little is known about how these pathways unfold sequentially across problems or how the timing of replays and other problem-solving strategies relates to proximal and distal learning outcomes. This study addresses these gaps using Markov Chains and Hidden Markov Models (HMMs) on log data from 777 seventh graders playing the game-based learning platform of From Here to There!. Results show that within problem sequences, students often persisted in states or engaged in immediate replay after successful completions, while across problems, strong self-transitions indicated stable strategic pathways. Four latent states emerged from HMMs: Incomplete-dominant, Optimal-ending, Replay, and Mixed. Regression analyses revealed that engagement in replay-dominant and optimal-ending states predicted higher conceptual knowledge, flexibility, and performance compared with the Incomplete-dominant state. Immediate replay consistently supported learning outcomes, whereas delayed replay was weakly or negatively associated in relation to Non-Replay. These findings suggest that replay in digital learning is not uniformly beneficial but depends on timing, with immediate replay supporting flexibility and more productive exploration.
How to Assess AI Literacy: Misalignment Between Self-Reported and Objective-Based Measures
arXiv (Cornell University) · 2026-01-03
preprintOpen accessThe widespread adoption of Artificial Intelligence (AI) in K-12 education highlights the need for psychometrically-tested measures of teachers' AI literacy. Existing work has primarily relied on either self-report (SR) or objective-based (OB) assessments, with few studies aligning the two within a shared framework to compare perceived versus demonstrated competencies or examine how prior AI literacy experience shapes this relationship. This gap limits the scalability of learning analytics and the development of learner profile-driven instructional design. In this study, we developed and evaluated SR and OB measures of teacher AI literacy within the established framework of Concept, Use, Evaluate, and Ethics. Confirmatory factor analyses support construct validity with good reliability and acceptable fit. Results reveal a low correlation between SR and OB factors. Latent profile analysis identified six distinct profiles, including overestimation (SR > OB), underestimation (SR < OB), alignment (SR close to OB), and a unique low-SR/low-OB profile among teachers without AI literacy experience. Theoretically, this work extends existing AI literacy frameworks by validating SR and OB measures on shared dimensions. Practically, the instruments function as diagnostic tools for professional development, supporting AI-informed decisions (e.g., growth monitoring, needs profiling) and enabling scalable learning analytics interventions tailored to teacher subgroups.
Let Me Try Again: Examining Replay Behavior by Tracing Students' Latent Problem-Solving Pathways
arXiv (Cornell University) · 2026-01-03
preprintOpen accessSenior authorPrior research has shown that students' problem-solving pathways in game-based learning environments reflect their conceptual understanding, procedural knowledge, and flexibility. Replay behaviors, in particular, may indicate productive struggle or broader exploration, which in turn foster deeper learning. However, little is known about how these pathways unfold sequentially across problems or how the timing of replays and other problem-solving strategies relates to proximal and distal learning outcomes. This study addresses these gaps using Markov Chains and Hidden Markov Models (HMMs) on log data from 777 seventh graders playing the game-based learning platform of From Here to There!. Results show that within problem sequences, students often persisted in states or engaged in immediate replay after successful completions, while across problems, strong self-transitions indicated stable strategic pathways. Four latent states emerged from HMMs: Incomplete-dominant, Optimal-ending, Replay, and Mixed. Regression analyses revealed that engagement in replay-dominant and optimal-ending states predicted higher conceptual knowledge, flexibility, and performance compared with the Incomplete-dominant state. Immediate replay consistently supported learning outcomes, whereas delayed replay was weakly or negatively associated in relation to Non-Replay. These findings suggest that replay in digital learning is not uniformly beneficial but depends on timing, with immediate replay supporting flexibility and more productive exploration.
Recent grants
Frequent coauthors
- 32 shared
Neil T. Heffernan
Worcester Polytechnic Institute
- 7 shared
John A. Erickson
Western Kentucky University
- 6 shared
Ryan S. Baker
- 6 shared
Ashvini Varatharaj
- 6 shared
Adam Sales
Worcester Polytechnic Institute
- 5 shared
Seth Adjei
Northern Kentucky University
- 5 shared
Sami Baral
Worcester Polytechnic Institute
- 5 shared
Thanaporn Patikorn
Worcester Polytechnic Institute
Labs
University of Florida College of EducationPI
Awards & honors
- Civic and Voter Engagement Fellowship (2024-25)
- Grants to Scholars Program (2024-25)
- Junior Faculty Award (2024-25)
- Mutual Mentoring Grant (2023-24)
- IDEA Innovation Grant (2023-24)
- Resume-aware match score
- Save to shortlist
- AI-drafted outreach
See your match with Anthony Botelho
PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.
- Free to start
- No credit card
- 30-second signup