
Richard Scheines
· The Bess Family Dean of the Marianna Brown Dietrich College of Humanities and Social SciencesCarnegie Mellon University · Philosophy
Active 1986–2018
About
Richard Scheines is the Bess Family Dean of the Marianna Brown Dietrich College of Humanities and Social Sciences at Carnegie Mellon University, where he has served as a professor since 2003 and as Dean since 2014. His research focuses on causal discovery, particularly the problem of learning about causation from statistical evidence. This work is embodied in the TETRAD project, which represents nearly 25 years of collaboration with Clark Glymour, Peter Spirtes, and others, and involves building efficient algorithms for causal discovery that integrate computer science and philosophy. Scheines holds a Ph.D. in History and Philosophy of Science from the University of Pittsburgh, with a thesis on causal models in the social sciences. His areas of specialization include Philosophy of Science (Causation), Artificial Intelligence (Machine Learning), and Educational Computing (Online Courses and Virtual Labs). He has courtesy appointments in the Machine Learning Department and the Human-Computer Interaction Institute at Carnegie Mellon. His professional activities include visiting scholar positions at UC Berkeley, UCLA, and the University of Groningen, and he has been recognized with awards such as the Causality in Statistics Education Award in 2013. His work extends into educational software development, causal inference, and policy advisory roles, contributing significantly to the fields of philosophy of science, machine learning, and educational data mining.
Research topics
- Computer science
- Mathematics
- Econometrics
- Artificial intelligence
- Psychology
Selected publications
Analysis of Microarray Data for Treated Fat Cells
Figshare · 2018-06-29
articleOpen accessDNA microarrays are perfectly suited for comparing gene expression in different populations of cells. An important application of microarray techniques is identifying genes which are activated by a particular drug of interest. This process will allow biologists to identify therapies targeted to particular diseases, and, eventually, to gain more knowledge about the biological processes in organisms. Such an application is described in this paper. It is focused on diabetes and obesity, which is a genetically heterogeneous disease, meaning that multiple defective genes are responsible for the diseases. The paper is divided in three parts, each dealing with a different problem addressed to our study. First we validate the data from our microarray experiment. We identified significant systematic sources of variability which are potentially issues for other microarray datasets. Second, we applied multiple hypothesis testing to identify differentially expressed genes. We found a set of genes which appear to change in expression level over time in response to a drug treatment. Third, we tried to address the problem of identification of co-expressed genes using cluster analysis. This last problem is still under discussion.
Student Profiling from Tutoring System Log Data: When do Multiple Graphical Representations Matter?
Figshare · 2018-06-29 · 2 citations
articleOpen accessSenior authorWe analyze log-data generated by an experiment with Mathtutor, an intelligent tutoring system for fractions. The experiment compares the educational effectiveness of instruction with single and multiple graphical representations. We extract the error-making and hint-seeking behaviors of each student to characterize their learning strategy. Using an expectation-maximization approach, we cluster the students by their strategic profile. We find that a) experimental condition and learning outcome are clearly associated b) experimental condition and learning strategy are not, and c) almost all of the association between experimental condition and learning outcome is found among students implementing just one of the learning strategies we identify. This class of students is characterized by relatively high rates of error as well as a marked reluctance to seek help. They also show the greatest educational gains from instruction with multiple rather than single representations. The behaviors that characterize this group illuminate the mechanism underlying the effectiveness of multiple representations and suggest strategies for tailoring instruction to individual students. Our methodology can be implemented in an on-line tutoring system to dynamically tailor individualized instruction.
An Experimental Comparison of Alternative Proof Construction Environments
Research Showcase @ Carnegie Mellon University (Carnegie Mellon University) · 2018-01-01 · 3 citations
articleOpen access1st authorCorrespondingAbstract: "In this paper we compare computerized environments in which students complete proof construction exercises in formal logic. Afterbeing given a pretest for logical aptitude, three matched groups were presented identical course material on logic for approximately five weeks by a computer. During the treatment, all students were required to complete several hundred proof construction exercises. The three groups did the exercises and the midterm in different environments. The group with a more sophisticated interface performed better on the midterm. Nearly all the difference in performance showed up in the harder problems. In a follow up experiment in which flexible strategic problem solving help was added to the environment, performance improved slightly, but the data are inconclusive."
Genetic Algorithm Search Over Causal Models
Figshare · 2018-06-29 · 14 citations
articleOpen accessSenior authorShane Harwood and Richard Scheines. Genetic Algorithm Search Over Causal Models.
Estimating Latent Causal Influences: TETRAD III Variable Selection and Bayesian Parameter Estimation
Figshare · 2018-06-29 · 1 citations
articleOpen access1st authorCorrespondingThe statistical evidence for the detrimental effect of exposure to low levels of lead on the cognitive capacities of children has been debated for several decades. In this paper I describe how two techniques from artificial intelligence and statistics help make the statistical evidence for the accepted epidemiological conclusion seem decisive. The first is a variable-selection routine in TETRAD III for finding causes, and the second a Bayesian estimation of the parameter reflecting the causal influence of Actual Lead Exposure, a latent variable, on the measured IQ score of middle class suburban children.
Time and Attention: Students, Sessions, and Tasks
Research Showcase @ Carnegie Mellon University (Carnegie Mellon University) · 2018-01-01 · 11 citations
articleStudents in two classes in the fall of 2004 making extensive use of online courseware were logged as they visited over 500 different “learning pages” which varied in length and in difficulty. We computed the time spent on each page by each student during each session they were logged in. We then modeled the time spent for a particular visit as a function of the page itself, the session, and the student. Surprisingly, the average time a student spent on learning pages (over their whole course experience) was of almost no value in predicting how long they would spend on a given page, even controlling for the session and page difficulty. The page itself was highly predictive, but so was the average time spent on learning pages in a given session. This indicates that local considerations, e.g., mood, deadline proximity, etc., play a much greater role in determining student pace and attention than do intrinsic student traits. We also consider the average time spent on learning pages as a function of the time of semester. Students spent less time on pages later in the semester, even for more demanding material.
Is the Doer Effect Robust across Multiple Data Sets
Educational Data Mining · 2018-07-01 · 9 citations
articleBrooklyn law review · 2018-06-29 · 7 citations
articleOpen access1st authorCorrespondingDepartment of Philosophy technical report
Figshare · 2018-06-29 · 14 citations
articleOpen accessSenior authorAlthough learning from multiple representations has been shown to be effective in a variety of domains, little is known about the mechanisms by which it occurs. We analyzed log data on error-rate, hint-use, and time-spent obtained from two experiments with a Cognitive Tutor for fractions. The goal of the experiments was to compare learning from multiple graphical representations of fractions to learning from a single graphical representation. Finding that a simple statistical model did not fit data from either experiment, we searched over all possible mediation models consistent with background knowledge, finding several that fit the data well. We also searched over alternative measures of student error-rate, hint-use, and time-spent to see if our data were better modeled with simple monotonic or u-shaped non-monotonic relationships. We found no evidence for non-monotonicity. No matter what measures we used, time-spent was irrelevant, and hint-use was only occasionally relevant. Although the total effect of multiple representations on learning was positive, they also had a negative effect on learning, mediated by a higher error-rate. Our evidence suggests that multiple representations increase error-rate, which in turn inhibits learning. The mechanisms by which multiple representations improve learning are as yet unmodeled
Unidimensional Linear Latent Variable Models
Research Showcase @ Carnegie Mellon University (Carnegie Mellon University) · 2018-06-29 · 3 citations
articleOpen access1st authorCorrespondingAbstract: "Linear structural equation models with latent (unmeasured) variables are used widely in sociology, psychometrics, and political science. When such models have a unidimensional (pure) measurement model (Gerbing and Anderson 82, 88; Scheines 92) they imply constraints on the measured covariances which can be used to either confirm unidimensionality or find submodels which are unidimensional. Assuming unidimensionality, the causal relations among the latent variables can be partially determined by examining other (related) constraints on the measured covariances.In this paper I prove first that unidimensionality is detectible from constraints on only the measured covariances no matter what the structure among latent variables, and second that in a structural equation model with a unidimensional measurement model, for any three latents T[subscript i], T[subscript j], and T[subscript k], [rho]T[subscript i],T[subscript j].T[subscript k] = 0 only if certain constraints hold on only the measured covariances."
Frequent coauthors
- 82 shared
Clark Glymour
- 78 shared
Peter Spirtes
Carnegie Mellon University
- 16 shared
Justin Sytsma
Victoria University of Wellington
- 16 shared
Édouard Machery
- 16 shared
Jonathan Livengood
University of Illinois Urbana-Champaign
- 16 shared
Adam Feltz
University of Oklahoma
- 15 shared
Kevin T. Kelly
- 7 shared
Christopher Meek
Education
- 1987
Ph.D., History and Philosophy of Science
University of Pittsburgh
Awards & honors
- Causality in Statistics Education Award – 2013
- Best Paper Award – 2013 6th International Workshop on Educat…
- Best Paper Award – 2008 1st International Workshop on Educat…
- Resume-aware match score
- Save to shortlist
- AI-drafted outreach
See your match with Richard Scheines
PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.
- Free to start
- No credit card
- 30-second signup