Resume-aware faculty matching

Find professors who actually fit you

Upload your resume. Four AI agents analyze your background, rank the faculty who fit, inspect their recent research, and help you draft outreach — grounded in their actual work, not templates.

Free to startNo credit cardCancel anytime
Top matches Balanced preset
Dr. Sarah Chen
Stanford · Interpretability · NLP
91
Dr. Marcus Holloway
MIT · Robotics · RL
84
Dr. Aisha Okonkwo
CMU · Fairness · HCI
82
Nova · Professor Researcher · re-ranking top 20…
David Ruppert

David Ruppert

Verified

Cornell University · Operations Research and Information Engineering

Active 1971–2025

h-index73
Citations28.7k
Papers43535 last 5y
Funding$266k
See your match with David Ruppert — sign in to PhdFit.Sign in

About

David Ruppert is the Andrew Schulz Jr. Professor of Engineering at the School of Operations Research and Information Engineering and also a Professor of Statistical Science at Cornell University. He holds a BA in Mathematics from Cornell University, an MA in Mathematics from the University of Vermont, and a PhD in Statistics and Probability from Michigan State University. His academic career includes positions as Assistant and Associate Professor of Statistics at the University of North Carolina, Chapel Hill, from 1977 to 1987. Professor Ruppert is a Fellow of the American Statistical Association (ASA) and the Institute of Mathematical Statistics (IMS), and he received the Wilcoxon Prize in 1986. Recognized as a highly cited researcher, he has been ranked 21st in mathematics by journal citations and has mentored 29 PhD students, many of whom are now leading researchers.

Research topics

  • Mathematics
  • Statistics
  • Computer science
  • Econometrics
  • Applied mathematics

Selected publications

  • Bayesian Functional Data Analysis in Astronomy

    2025-11-04

    articleOpen accessSenior author

    Cosmic demographics—the statistical study of populations of astrophysical objects—has long relied on tools from multivariate statistics for analyzing data comprising fixed-length vectors of properties of objects, as might be compiled in a tabular astronomical catalog (say, with sky coordinates, and brightness measurements in a fixed number of spectral passbands). But beginning with the emergence of automated digital sky surveys, ca. 2000, astronomers began producing large collections of data with more complex structures: light curves (brightness time series) and spectra (brightness vs. wavelength). These comprise what statisticians call functional data—measurements of populations of functions. Upcoming automated sky surveys will soon provide astronomers with a flood of functional data. New methods are needed to accurately and optimally analyze large ensembles of light curves and spectra, accumulating information both along individual measured functions and across a population of such functions. Functional data analysis (FDA) provides tools for statistical modeling of functional data. Astronomical data presents several challenges for FDA methodology, e.g., sparse, irregular, and asynchronous sampling, and heteroscedastic measurement error. Bayesian FDA uses hierarchical Bayesian models for function populations, and is well suited to addressing these challenges. We provide an overview of astronomical functional data and some key Bayesian FDA modeling approaches, including functional mixed effects models, and stochastic process models. We briefly describe a Bayesian FDA framework combining FDA and machine learning methods to build low-dimensional parametric models for galaxy spectra.

  • Bayesian analysis of regression discontinuity designs with heterogeneous treatment effects

    ArXiv.org · 2025-04-14

    preprintOpen accessSenior author

    Regression Discontinuity Design (RDD) is a popular framework for estimating a causal effect in settings where treatment is assigned if an observed covariate exceeds a fixed threshold. We consider estimation and inference in the common setting where the sample consists of multiple known sub-populations with potentially heterogeneous treatment effects. In the applied literature, it is common to account for heterogeneity by either fitting a parametric model or considering each sub-population separately. In contrast, we develop a Bayesian hierarchical model using Gaussian process regression which allows for non-parametric regression while borrowing information across sub-populations. We derive the posterior distribution, prove posterior consistency, and develop a Metropolis-Hastings within Gibbs sampling algorithm. In extensive simulations, we show that the proposed procedure outperforms existing methods in both estimation and inferential tasks. Finally, we apply our procedure to U.S. Senate election data and discover an incumbent party advantage which is heterogeneous over different time periods.

  • Correction to: Dynamic Shrinkage Processes

    Journal of the Royal Statistical Society Series B (Statistical Methodology) · 2024-11-11

    articleOpen accessSenior author
  • A novel approach to assessing the joint effects of mercury and fish consumption on neurodevelopment in the New Bedford Cohort

    American Journal of Epidemiology · 2024-06-28 · 5 citations

    articleOpen access

    Understanding health risks from methylmercury (MeHg) exposure is complicated by its link to fish consumption, which may confound or modify toxicities. One solution is to include fish intake and a biomarker of MeHg exposure in the same analytical model, but resulting estimates do not reflect the independent impact of accumulated MeHg or fish exposure. In fish-eating populations, this can be addressed by separating MeHg exposure into fish intake and average mercury content of the consumed fish. We assessed the joint association of prenatal MeHg exposure (maternal hair mercury level) and fish intake (among fish-eating mothers) with neurodevelopment in 361 children aged 8 years from the New Bedford Cohort (New Bedford, Massachusetts; born in 1993-1998). Neurodevelopmental assessments used standardized tests of IQ, language, memory, and attention. Covariate-adjusted regression assessed the association of maternal fish consumption, stratified by tertile of estimated average fish mercury level, with neurodevelopment. Associations between maternal fish intake and child outcomes were generally beneficial for those in the lowest average fish mercury tertile but detrimental in the highest average fish mercury tertile, where, for example, each serving of fish was associated with 1.3 fewer correct responses (95% CI, -2.2 to -0.4) on the Boston Naming Test. Standard analyses showed no outcome associations with hair mercury level or fish intake. This article is part of a Special Collection on Environmental Epidemiology.

  • Bayesian functional data analysis in astronomy

    arXiv (Cornell University) · 2024-08-26 · 1 citations

    preprintOpen accessSenior author

    Cosmic demographics -- the statistical study of populations of astrophysical objects -- has long relied on *multivariate statistics*, providing methods for analyzing data comprising fixed-length vectors of properties of objects, as might be compiled in a tabular astronomical catalog (say, with sky coordinates, and brightness measurements in a fixed number of spectral passbands). But beginning with the emergence of automated digital sky surveys, ca. ~2000, astronomers began producing large collections of data with more complex structure: light curves (brightness time series) and spectra (brightness vs. wavelength). These comprise what statisticians call *functional data* -- measurements of populations of functions. Upcoming automated sky surveys will soon provide astronomers with a flood of functional data. New methods are needed to accurately and optimally analyze large ensembles of light curves and spectra, accumulating information both along and across measured functions. Functional data analysis (FDA) provides tools for statistical modeling of functional data. Astronomical data presents several challenges for FDA methodology, e.g., sparse, irregular, and asynchronous sampling, and heteroscedastic measurement error. Bayesian FDA uses hierarchical Bayesian models for function populations, and is well suited to addressing these challenges. We provide an overview of astronomical functional data, and of some key Bayesian FDA modeling approaches, including functional mixed effects models, and stochastic process models. We briefly describe a Bayesian FDA framework combining FDA and machine learning methods to build low-dimensional parametric models for galaxy spectra.

  • Characterization of extrasolar giant planets with machine learning

    Monthly Notices of the Royal Astronomical Society Letters · 2023-10-04

    articleOpen accessSenior author

    ABSTRACT More than 5000 extrasolar planets have already been detected. JWST and near-term ground-based telescopes like the Extremely Large Telescope (ELT), Giant Magellan Telescope (GMT), Thirty Meter Telescope (TMT), and upcoming telescopes such as the Nancy Grace Roman Space Telescope, Xuntian, and Ariel are designed to characterize the atmosphere of directly imaged Jovian planets. Here, we used five diverse machine learning algorithms to investigate how well broad-band filter photometric fluxes could initially characterize giant exoplanets. We use an established grid of 8813 reflected light model spectra of different metallicities, planet–star distances, and cloud properties to assess the performance of several machine learning algorithms on both noiseless and noisy data to provide classification and regression results as a function of signal to noise of the data. In all cases, the algorithms were tested on noisy validation data. The results show that the use of machine learning to characterize giant planets from reflected broad-band filter photometry provides a promising tool for initial characterization, with over 65 per cent accuracy in characterizing metallicity for signal-to-noise ratios (S/N) ≳ 30, over 80 per cent for cloud coverage for S/N ≳ 30. This approach will allow initial characterization for large surveys of giant exoplanets and prioritization for spectroscopy observations of a subset of these worlds.

  • Splines 'n Lines: Rest-frame galaxy spectral energy distributions via Bayesian functional data analysis

    arXiv (Cornell University) · 2023-10-30

    preprintOpen accessSenior author

    Survey-based measurements of the spectral energy distributions (SEDs) of galaxies have flux density estimates on badly misaligned grids in rest-frame wavelength. The shift to rest frame wavelength also causes estimated SEDs to have differing support. For many galaxies, there are sizeable wavelength regions with missing data. Finally, dim galaxies dominate typical samples and have noisy SED measurements, many near the limiting signal-to-noise level of the survey. These limitations of SED measurements shifted to the rest frame complicate downstream analysis tasks, particularly tasks requiring computation of functionals (e.g., weighted integrals) of the SEDs, such as synthetic photometry, quantifying SED similarity, and using SED measurements for photometric redshift estimation. We describe a hierarchical Bayesian framework, drawing on tools from functional data analysis, that models SEDs as a random superposition of smooth continuum basis functions (B-splines) and line features, comprising a finite-rank, nonstationary Gaussian process, measured with additive Gaussian noise. We apply this *Splines 'n Lines* (SnL) model to a collection of 678,239 galaxy SED measurements comprising the Main Galaxy Sample from the Sloan Digital Sky Survey, Data Release 17, demonstrating capability to provide continuous estimated SEDs that reliably denoise, interpolate, and extrapolate, with quantified uncertainty, including the ability to predict line features where there is missing data by leveraging correlations between line features and the entire continuum.

  • Measurement errors in semi‐parametric generalised regression models

    Australian & New Zealand Journal of Statistics · 2023-10-11

    articleSenior author

    Summary Regression models that ignore measurement error in predictors may produce highly biased estimates leading to erroneous inferences. It is well known that it is extremely difficult to take measurement error into account in Gaussian non‐parametric regression. This problem becomes even more difficult when considering other families such as binary, Poisson and negative binomial regression. We present a novel method aiming to correct for measurement error when estimating regression functions. Our approach is sufficiently flexible to cover virtually all distributions and link functions regularly considered in generalised linear models. This approach depends on approximating the first and the second moment of the response after integrating out the true unobserved predictors in any semi‐parametric generalised regression model. By the latter is meant a model with both linear and non‐parametric effects that are connected to the mean response by a link function and with a response distribution in an exponential family or quasi‐likelihood model. Unlike previous methods, the method we now propose is not restricted to truncated splines and can utilise various basis functions. Moreover, it can operate without making any distributional assumption about the unobserved predictor. Through extensive simulation studies, we study the performance of our method under many scenarios.

  • Bayesian Functional Principal Components Analysis via Variational Message Passing with Multilevel Extensions

    Bayesian Analysis · 2023-08-08 · 2 citations

    articleOpen accessSenior author

    Standard approaches for functional principal components analysis rely on an eigendecomposition of a smoothed covariance surface in order to extract the orthonormal eigenfunctions representing the major modes of variation in a set of functional data. This approach can be a computationally intensive procedure, especially in the presence of large datasets with irregular observations. In this article, we develop a variational Bayesian approach, which aims to determine the Karhunen-Loève decomposition directly without smoothing and estimating a covariance surface. More specifically, we incorporate the notion of variational message passing over a factor graph because it removes the need for rederiving approximate posterior density functions if there is a change in the model. Instead, model changes are handled by changing specific computational units, known as fragments, within the factor graph – we demonstrate this with an extension to multilevel functional data. Indeed, this is the first article to address a functional data model via variational message passing. Our approach introduces three new fragments that are necessary for Bayesian functional principal components analysis. We present the computational details, a set of simulations for assessing the accuracy and speed of the variational message passing algorithm and an application to United States temperature data.

  • Maximizing Portfolio Predictability with Machine Learning

    SSRN Electronic Journal · 2023-01-01

    articleOpen accessSenior author

Recent grants

Frequent coauthors

Education

  • PhD, Statistics and Probability

    Michigan State University

    1977
  • MA, Mathematics

    University of Vermont

    1973
  • BA, Mathematics

    Cornell University

    1970

Awards & honors

  • Wilcoxon Prize (1986)
  • Fellow of the ASA
  • Fellow of the IMS
  • Resume-aware match score
  • Save to shortlist
  • AI-drafted outreach

See your match with David Ruppert

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

  • Free to start
  • No credit card
  • 30-second signup