
Edward Haertel
Stanford University · Social and Cultural Analysis in Education
Active 1975–2018
About
Dr. Edward Haertel is an Emeritus Professor at the Stanford Graduate School of Education, where he specializes in educational testing and assessment. His research and teaching focus on psychometrics and educational policy, with particular emphasis on test-based accountability and the policy uses of test data. His recent work has examined standard setting methods, limitations of value-added models for teacher and school accountability, impacts of testing on curriculum, students, and educational policy, as well as test reliability and generalizability theory. Dr. Haertel is recognized for his expertise in assessment, testing and measurement, international and comparative education, school reform, standards, and teachers and teaching.
Research topics
- Psychology
- Mathematics education
- Computer science
- Statistics
- Sociology
Selected publications
Classroom Assessment of Sociocultural Interactions
PsycTESTS Dataset · 2018-01-01
datasetSenior authorTests, Test Scores, and Constructs
Educational Psychologist · 2018-06-29 · 14 citations
article1st authorCorrespondingIn the service of educational accountability, student achievement tests are being used to measure constructs quite unlike those envisioned by test developers. Scores are compared to cut points to create classifications like “proficient”; scores are combined over time to measure growth; student scores are aggregated to measure the effectiveness of teachers, schools, and school districts; indices are created to measure college and career readiness. These and other new uses rely on derived scores created to measure new constructs. The field of educational and psychological measurement has largely ignored these significant, consequential measurement applications. The conceptual frameworks and analytical tools of educational and psychological measurement should be used to study such derived scores and the validity of their uses and interpretations.
American Journal of Education · 2018-04-19
article1st authorCorrespondingMeasuring Cultural Dimensions of Classroom Interactions
Educational Assessment · 2018-09-21 · 37 citations
articleSenior authorWe trace the development and analyze the generalizability of the Classroom Assessment of Sociocultural Interactions (CASI), an observation system designed to measure cultural dimensions of classroom interactions. We establish CASI measurement properties by analyzing panoramic videos of 4th and 5th grade classrooms from the Measures of Effective Teaching project, and argue for its significance in terms of achievement opportunity for minoritized students and needed evidence regarding equitable teaching. We frame ten dimensions of sociocultural interactions within three domains: Life Applications (i.e., connections with what students know and do outside of school); Self in Group (i.e., interdependence to motivate learning and foster social identities); and Agency (i.e., how freedom and choice are managed).We demonstrate how measurement error is associated with raters, lessons, and lesson segments, and discuss implications for CASI refinement, as well as appropriate instrument uses to enrich learning opportunities for minoritized students across a variety of classroom settings.
Tests, Test Scores, Constructs and Success in the World (Thorndike Career Achievement Award)
PsycEXTRA Dataset · 2017-01-01
dataset1st authorCorresponding2016-09-19 · 6 citations
book-chapter1st authorCorrespondingThis chapter addresses the logic of validation for uses and interpretations of such derived scores. The derived score, which can be seen as an extension of the original score that measures an application construct, should require a second round of validation, looping through scoring, generalization, extrapolation, and use a second time. Validation would begin with the test developer’s investigation of achievement tests used to measure reading and mathematics achievement on continuous scales. Then, a second looping through Kane’s four stages would address the derived scores created when scale scores are mapped into “Below Basic,” “Basic,” “Proficient,” and “Advanced” according to cut scores defined by a judgmental standard setting process. Teacher effectiveness estimates from value-added models may be regarded as derived scores created by a complex measurement procedure to measure an application construct far removed from individual student scores obtained at a single point in time.
Engaging Methodological Pluralism
2016-05-01 · 57 citations
book-chapterSenior authorCommentary on Chapters 12–15: Future Directions
2015-08-20
book-chapterOpen access1st authorCorrespondingThis chapter considers some implications of technology for score comparability and offers a useful framework for future investigation as both item formats and varieties of interactive devices evolve. It offers a comprehensive and informative history and overview of diagnostic classification models (DCMs), especially the log-linear cognitive diagnosis model (LCDM), which subsumes many earlier DCMs as special cases. DCMs will be most useful when applied to tests designed from the outset to differentiate among specific student misconceptions or skill profiles. The chapter shows the various ways in which computer-based testing (CBT) has spurred new developments in IRT and addresses a range of concerns beyond the obvious requirement for item-selection algorithms for computerized adaptive testing (CAT). It discusses the trajectories of technical developments in CAT versus automatic essay scoring (AES) and proposes that progress was less rapid in AES in part because much of the early research and development work was proprietary.
Reflections on the Gordon Commission
Teachers College Record The Voice of Scholarship in Education · 2014-11-01 · 1 citations
article1st authorCorrespondingBackground This brief reflection on the work of the Gordon Commission calls out significant themes and implications found in the various papers authored by the commissioners and other scholars, especially those included in this special issue of Teachers College Record. Purpose The forward-looking vision of the Gordon Commission is contrasted with contemporary teaching and testing practices to highlight implications for new assessment purposes and methods. It is argued that a new vision of assessment is inseparable from a new vision of teaching and learning. To realize this new vision, some current practices, especially uses of testing to sort and select students and to rank teachers and schools, will need to be greatly attenuated or even abandoned. Research Design This is a narrative review expressing the author's own point of view. No empirical findings are cited. Conclusions A conservative reading of the Gordon Commission's work might suggest that educational assessment tomorrow should function much as it does today, only better. A closer reading, however, suggests a more radical view. Assessment FOR education must break free from the constraints of standardization and consequential comparison.
TOEFL iBT research reports · 2013-01-01 · 43 citations
article1st authorCorresponding
Frequent coauthors
- 16 shared
Robert C. Calfee
California Polytechnic State University
- 10 shared
Linda Darling‐Hammond
Learning Policy Institute
- 10 shared
Geneva D. Haertel
SRI International
- 8 shared
Herbert J. Walberg
- 8 shared
Pamela Moss
- 7 shared
Diana Pullin
- 7 shared
Andrew Ho
- 7 shared
Jesse Rothstein
University of California, Berkeley
- Resume-aware match score
- Save to shortlist
- AI-drafted outreach
See your match with Edward Haertel
PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.
- Free to start
- No credit card
- 30-second signup