Resume-aware faculty matching

Find professors who actually fit you

Upload your resume. Four AI agents analyze your background, rank the faculty who fit, inspect their recent research, and help you draft outreach — grounded in their actual work, not templates.

Free to startNo credit cardCancel anytime
Top matches Balanced preset
Dr. Sarah Chen
Stanford · Interpretability · NLP
91
Dr. Marcus Holloway
MIT · Robotics · RL
84
Dr. Aisha Okonkwo
CMU · Fairness · HCI
82
Nova · Professor Researcher · re-ranking top 20…
David Donoho

David Donoho

Verified

Stanford University · Statistics

Active 1981–2026

h-index119
Citations164.3k
Papers34729 last 5y
Funding$701k
See your match with David Donoho — sign in to PhdFit.Sign in

About

I have studied the exploitation of sparse signals in signal recovery, including for denoising, superresolution, and solution of underdetermined equations. This research with collaborators showed that ell-1 penalization was an effective and even optimal way to exploit sparsity of the object to be recovered. Compressed sensing has impacted scientific and technical fields, including magnetic resonance imaging in medicine, where it has been implemented in FDA-approved medical imaging protocols already used for millions of patient MRIs. In recent years, my postdocs and students have been studying large-scale covariance matrix estimation, large-scale matrix denoising, detection of rare and weak signals among many pure noise non-signals, compressed sensing and related scientific imaging problems, and most recently, empirical deep learning.

Research topics

  • Mathematics
  • Computer science
  • Algorithm
  • Artificial intelligence
  • Combinatorics

Selected publications

  • "Rebuilding" Statistics in the Age of AI: A Town Hall Discussion on Culture, Infrastructure, and Training

    ArXiv.org · 2026-01-24

    articleOpen access1st authorCorresponding

    This article presents the full, original record of the 2024 Joint Statistical Meetings (JSM) town hall, "Statistics in the Age of AI," which convened leading statisticians to discuss how the field is evolving in response to advances in artificial intelligence, foundation models, large-scale empirical modeling, and data-intensive infrastructures. The town hall was structured around open panel discussion and extensive audience Q&A, with the aim of eliciting candid, experience-driven perspectives rather than formal presentations or prepared statements. This document preserves the extended exchanges among panelists and audience members, with minimal editorial intervention, and organizes the conversation around five recurring questions concerning disciplinary culture and practices, data curation and "data work," engagement with modern empirical modeling, training for large-scale AI applications, and partnerships with key AI stakeholders. By providing an archival record of this discussion, the preprint aims to support transparency, community reflection, and ongoing dialogue about the evolving role of statistics in the data- and AI-centric future.

  • "Rebuilding" Statistics in the Age of AI: A Town Hall Discussion on Culture, Infrastructure, and Training

    arXiv (Cornell University) · 2026-01-24

    preprintOpen access1st authorCorresponding

    This article presents the full, original record of the 2024 Joint Statistical Meetings (JSM) town hall, "Statistics in the Age of AI," which convened leading statisticians to discuss how the field is evolving in response to advances in artificial intelligence, foundation models, large-scale empirical modeling, and data-intensive infrastructures. The town hall was structured around open panel discussion and extensive audience Q&A, with the aim of eliciting candid, experience-driven perspectives rather than formal presentations or prepared statements. This document preserves the extended exchanges among panelists and audience members, with minimal editorial intervention, and organizes the conversation around five recurring questions concerning disciplinary culture and practices, data curation and "data work," engagement with modern empirical modeling, training for large-scale AI applications, and partnerships with key AI stakeholders. By providing an archival record of this discussion, the preprint aims to support transparency, community reflection, and ongoing dialogue about the evolving role of statistics in the data- and AI-centric future.

  • Hybrid BAG-seq: DNA and RNA from the same single nucleus reveals interactions between genomic and transcriptomic landscapes in human tumor samples

    Genome biology · 2025-09-27

    articleOpen access

    We introduce hybrid BAG-seq: a high-throughput, multi-omic method that simultaneously captures DNA and RNA from single nuclei. We apply this protocol to 65,499 single nuclei from samples of five uterine cancer patients and validate the clustering using RNA-only and DNA-only protocols from the same tissues. Multiple tumor genome or expression clusters are often present within a patient, with different tumor clones projecting into distinct or shared expression states, demonstrating nearly all possible genome-transcriptome correlations. We also identify mutant stroma with significant X chromosome loss in various cell types and patient-specific stromal subtypes exhibiting aberrant expression patterns.

  • Statistics and AI: A Fireside Conversation

    UNC Libraries · 2025-05-08

    articleOpen access

    A 3-hour webinar titled “Statistics and AI – A Fireside Conversation” was held on Sunday, March 17, 2024, attracting an online audience of approximately 1,000. The event featured three sessions aimed at engaging the statistical community on key topics in the AI era: addressing statistical challenges and opportunities (Panel I), evolving the publication process (Panel II), and advancing next-generation statistical pipelines and resources (Panel III). Panel I examined issues such as dwindling talent, shifting funding landscapes, and AI's rapid rise, highlighting the need for statistical rigor, interdisciplinary collaboration, and innovative approaches to shape the future of AI. Panel II emphasized the importance of streamlining the publication process, fostering impactful research, and prioritizing workflows and data quality. Panel III focused on modernizing statistical education by integrating AI and deep learning, promoting  interdisciplinary collaboration, and maintaining foundational principles such as uncertainty and reproducibility. These discussions collectively outlined a strategic roadmap for ensuring the relevance and advancement of statistics in the age of AI. These discussions were organized by (in alphabetical order) Xihong Lin (Harvard University), Tracy Ke (Harvard University), Tian Zheng (Columbia University), Jing Zhou (University of California at Los Angeles), and Hongtu Zhu (University of North Carolina at Chapel Hill). In the dynamic landscape of statistical science, the fireside chat organized by the Stats Up AI Alliance (https://statsupai.org/) and the International Chinese Statistical Association (ICSA) emerged as a seminal event, bringing together leading experts to explore the evolving role of statistics in the era of artificial intelligence.

  • Statistics and AI: A Fireside Conversation

    Harvard Data Science Review · 2025-04-30 · 2 citations

    articleOpen access

    A 3-hour webinar titled “Statistics and AI – A Fireside Conversation” was held on Sunday, March 17, 2024, attracting an online audience of approximately 1,000. The event featured three sessions aimed at engaging the statistical community on key topics in the AI era: addressing statistical challenges and opportunities (Panel I), evolving the publication process (Panel II), and advancing next-generation statistical pipelines and resources (Panel III). Panel I examined issues such as dwindling talent, shifting funding landscapes, and AI's rapid rise, highlighting the need for statistical rigor, interdisciplinary collaboration, and innovative approaches to shape the future of AI. Panel II emphasized the importance of streamlining the publication process, fostering impactful research, and prioritizing workflows and data quality. Panel III focused on modernizing statistical education by integrating AI and deep learning, promoting interdisciplinary collaboration, and maintaining foundational principles such as uncertainty and reproducibility. These discussions collectively outlined a strategic roadmap for ensuring the relevance and advancement of statistics in the age of AI.These discussions were organized by (in alphabetical order) Xihong Lin (Harvard University), Tracy Ke (Harvard University), Tian Zheng (Columbia University), Jing Zhou (University of California at Los Angeles), and Hongtu Zhu (University of North Carolina at Chapel Hill).In the dynamic landscape of statistical science, the fireside chat organized by the Stats Up AI Alliance (https://statsupai.org/ <https://statsupai.org/> ) and the International Chinese Statistical Association (ICSA) emerged as a seminal event, bringing together leading experts to explore the evolving role of statistics in the era of artificial intelligence.

  • Data Science at the Singularity

    Harvard Data Science Review · 2024-01-29 · 42 citations

    articleOpen access1st authorCorresponding

    Something fundamental to computation-based research has really changed in the last ten years. In certain fields, progress is simply dramatically more rapid than previously. Researchers in affected fields are living through a period of profound transformation, as the fields undergo a transition to frictionless reproducibility (FR). This transition markedly changes the rate of spread of ideas and practices, affects scientific mindsets and the goals of science, and erases memories of much that came before.The emergence of FR flows from 3 data science principles that matured together after decades of work by many technologists and numerous research communities. The mature principles involve data sharing, code sharing, and competitive challenges, however implemented in the particularly strong form of frictionless open services. Empirical Machine Learning is today’s leading adherent field; its hidden superpower is adherence to frictionless reproducibility practices; these practices are responsible for the striking and surprising progress in AI that we see everywhere; they can be learned and adhered to by researchers in whatever research field, automatically increasing the rate of progress in each adherent field.

  • Optimal Covariance Estimation for Condition Number Loss in the Spiked model

    Econometrics and Statistics · 2024-05-01 · 1 citations

    article1st authorCorresponding
  • Principled and interpretable alignability testing and integration of single-cell data

    Proceedings of the National Academy of Sciences · 2024-02-28 · 22 citations

    articleOpen access

    Single-cell data integration can provide a comprehensive molecular view of cells, and many algorithms have been developed to remove unwanted technical or biological variations and integrate heterogeneous single-cell datasets. Despite their wide usage, existing methods suffer from several fundamental limitations. In particular, we lack a rigorous statistical test for whether two high-dimensional single-cell datasets are alignable (and therefore should even be aligned). Moreover, popular methods can substantially distort the data during alignment, making the aligned data and downstream analysis difficult to interpret. To overcome these limitations, we present a spectral manifold alignment and inference (SMAI) framework, which enables principled and interpretable alignability testing and structure-preserving integration of single-cell data with the same type of features. SMAI provides a statistical test to robustly assess the alignability between datasets to avoid misleading inference and is justified by high-dimensional statistical theory. On a diverse range of real and simulated benchmark datasets, it outperforms commonly used alignment methods. Moreover, we show that SMAI improves various downstream analyses such as identification of differentially expressed genes and imputation of single-cell spatial transcriptomics, providing further biological insights. SMAI's interpretability also enables quantification and a deeper understanding of the sources of technical confounders in single-cell data.

  • Rejoinder to Discussion of "Data Science at the Singularity"

    Harvard Data Science Review · 2024-07-11 · 1 citations

    articleOpen access1st authorCorresponding

    Rejoinder to Discussion of "Data Science at the Singularity" 2 I am impressed by the number, diversity, and seriousness of the discussions.I sense general agreement about the data science reality that has been forming over the last decades, some of the larger forces driving it, and the resulting permanent changes to research computing and scientific publishing that will ensue.I also sense concerns and important reservations, maybe not so much about what my article says, but what it does not begin to acknowledge and discuss.Each discussant makes unique and valuable points about issues exposed by these rapid changes across a broad range of fields represented and topics discussed.I can only admire and celebrate these contributions.In this rejoinder I will refer to the original

  • Universality of the $π^2/6$ Pathway in Avoiding Model Collapse

    arXiv (Cornell University) · 2024-10-30

    preprintOpen accessSenior author

    Researchers in empirical machine learning recently spotlighted their fears of so-called Model Collapse. They imagined a discard workflow, where an initial generative model is trained with real data, after which the real data are discarded, and subsequently, the model generates synthetic data on which a new model is trained. They came to the conclusion that models degenerate as model-fitting generations proceed. However, other researchers considered an augment workflow, where the original real data continue to be used in each generation of training, augmented by synthetic data from models fit in all earlier generations. Empirical results on canonical datasets and learning procedures confirmed the occurrence of model collapse under the discard workflow and avoidance of model collapse under the augment workflow. Under the augment workflow, theoretical evidence also confirmed avoidance in particular instances; specifically, Gerstgrasser et al. (2024) found that for classical Linear Regression, test risk at any later generation is bounded by a moderate multiple, viz. pi-squared-over-6 of the test risk of training with the original real data alone. Some commentators questioned the generality of theoretical conclusions based on the generative model assumed in Gerstgrasser et al. (2024): could similar conclusions be reached for other task/model pairings? In this work, we demonstrate the universality of the pi-squared-over-6 augment risk bound across a large family of canonical statistical models, offering key insights into exactly why collapse happens under the discard workflow and is avoided under the augment workflow. In the process, we provide a framework that is able to accommodate a large variety of workflows (beyond discard and augment), thereby enabling an experimenter to judge the comparative merits of multiple different workflows by simulating a simple Gaussian process.

Recent grants

Frequent coauthors

Education

  • Ph.D., Statistics

    Harvard University

    1984
  • AB, Statistics

    Princeton University

    1978

Awards & honors

  • 2022 IEEE Jack S. Kilby Signal Processing Medal
  • Resume-aware match score
  • Save to shortlist
  • AI-drafted outreach

See your match with David Donoho

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

  • Free to start
  • No credit card
  • 30-second signup