Resume-aware faculty matching

Find professors who actually fit you

Upload your resume. Four AI agents analyze your background, rank the faculty who fit, inspect their recent research, and help you draft outreach — grounded in their actual work, not templates.

Free to startNo credit cardCancel anytime
Top matches Balanced preset
Dr. Sarah Chen
Stanford · Interpretability · NLP
91
Dr. Marcus Holloway
MIT · Robotics · RL
84
Dr. Aisha Okonkwo
CMU · Fairness · HCI
82
Nova · Professor Researcher · re-ranking top 20…
Galen Reeves

Galen Reeves

Verified

Duke University · Electrical and Computer Engineering

Active 2007–2026

h-index20
Citations1.4k
Papers10928 last 5y
Funding$490k
See your match with Galen Reeves — sign in to PhdFit.Sign in

About

Galen Reeves is an Associate Professor at Duke University with a joint appointment in the Department of Electrical and Computer Engineering and the Department of Statistical Science. He joined the faculty at Duke in Fall 2013. He completed his PhD in Electrical Engineering and Computer Sciences at the University of California, Berkeley in 2011, followed by a postdoctoral associate position in the Departments of Statistics at Stanford University from 2011 to 2013. His research interests encompass information theory, statistics, machine learning, and signal processing, with a particular focus on fundamental questions regarding the amount of data needed for inference or learning tasks, the representation of uncertainty in high-dimensional and complex spaces, and the gap between practical and combinatorial methods. His research approach is highly interdisciplinary, drawing on ideas from engineering, statistics, mathematics, theoretical computer science, and statistical physics. Reeves has been recognized with the NSF CAREER award in 2017 and is a corecipient of the 2025 Information Theory Society Paper Award.

Research topics

  • Computer Science
  • Machine Learning
  • Artificial Intelligence
  • Mathematics
  • Statistics
  • Discrete mathematics
  • Telecommunications
  • Arithmetic
  • Algorithm

Selected publications

  • Reed--Muller Codes Achieve the Symmetric Capacity on Finite-State Channels

    arXiv (Cornell University) · 2026-04-16

    preprintOpen accessSenior author

    We study reliable communication over finite-state channels (FSCs) using Reed--Muller (RM) codes. Building on recent symmetry-based analyses for memoryless channels, we show that a sequence of binary RM codes (with some random scrambling) can achieve the symmetric capacity (or uniform-input information rate) of a binary-input indecomposable FSC. Our approach has three components. First, we establish a capacity-via-symmetry theorem for doubly-transitive group codes on discrete memoryless channels (DMCs) with non-binary inputs, under some symmetry and puncturing conditions. Then, we reduce a binary-input FSC to an almost memoryless non-binary channel by grouping adjacent input bits into blocks and interleaving non-binary codes onto the channel. Finally, we show that the interleaved non-binary codes can be constructed from a single binary RM code.

  • Reed--Muller Codes Achieve the Symmetric Capacity on Finite-State Channels

    ArXiv.org · 2026-04-16

    articleOpen accessSenior author

    We study reliable communication over finite-state channels (FSCs) using Reed--Muller (RM) codes. Building on recent symmetry-based analyses for memoryless channels, we show that a sequence of binary RM codes (with some random scrambling) can achieve the symmetric capacity (or uniform-input information rate) of a binary-input indecomposable FSC. Our approach has three components. First, we establish a capacity-via-symmetry theorem for doubly-transitive group codes on discrete memoryless channels (DMCs) with non-binary inputs, under some symmetry and puncturing conditions. Then, we reduce a binary-input FSC to an almost memoryless non-binary channel by grouping adjacent input bits into blocks and interleaving non-binary codes onto the channel. Finally, we show that the interleaved non-binary codes can be constructed from a single binary RM code.

  • Linear operator approximate message passing (OpAMP)

    Information and Inference A Journal of the IMA · 2025-10-06

    articleSenior author

    Abstract This paper introduces a framework for approximate message passing (AMP) in dynamic settings where the data at each iteration is passed through a linear operator. This framework is motivated in part by applications in large-scale, distributed computing where only a subset of the data is available at each iteration. An autoregressive memory term is used to mitigate information loss across iterations and a specialized algorithm, called projection AMP, is designed for the case where each linear operator is an orthogonal projection. Precise theoretical guarantees are provided for a class of Gaussian matrices and non-separable denoising functions. Specifically, it is shown that the iterates can be well approximated in the high-dimensional limit by a Gaussian process whose second-order statistics are defined recursively via state evolution. These results are applied to the problem of estimating a rank-one spike corrupted by additive Gaussian noise using partial row updates, and the theory is validated by numerical simulations.

  • Information-Theoretic Proofs for Diffusion Sampling

    2025-06-22 · 1 citations

    article1st authorCorresponding

    This paper provides an elementary, self-contained analysis of diffusion-based sampling methods for generative modeling. In contrast to existing approaches that rely on continuous-time processes and then discretize, our treatment works directly with discrete-time stochastic processes and yields precise non-asymptotic convergence guarantees under broad assumptions. The key insight is to couple the sampling process of interest with an idealized comparison process that has an explicit Gaussian-convolution structure. We then leverage simple identities from information theory, including the I- MMSE relationship, to bound the discrepancy (in terms of the Kullback-Leibler divergence) between these two discrete-time processes. In particular, we show that, if the diffusion step sizes are chosen sufficiently small and one can approximate certain conditional mean estimators well, then the sampling distribution is provably close to the target distribution. Our results also provide a transparent view on how to accelerate convergence by using additional randomness in each step to match higher-order moments in the comparison process.

  • What happens when generative AI models train recursively on each others' outputs?

    ArXiv.org · 2025-05-27 · 1 citations

    preprintOpen access

    The internet serves as a common source of training data for generative AI (genAI) models but is increasingly populated with AI-generated content. This duality raises the possibility that future genAI models may be trained on other models' generated outputs. Prior work has studied consequences of models training on their own generated outputs, but limited work has considered what happens if models ingest content produced by other models. Given society's increasing dependence on genAI tools, understanding such data-mediated model interactions is critical. This work provides empirical evidence for how data-mediated interactions might unfold in practice, develops a theoretical model for this interactive training process, and experimentally validates the theory. We find that data-mediated interactions can benefit models by exposing them to novel concepts perhaps missed in original training data, but also can homogenize their performance on shared tasks.

  • Statistical Limits for Finite-Rank Tensor Estimation

    ArXiv.org · 2025-06-07

    preprintOpen accessSenior author

    This paper provides a unified framework for analyzing tensor estimation problems that allow for nonlinear observations, heteroskedastic noise, and covariate information. We study a general class of high-dimensional models where each observation depends on the interactions among a finite number of unknown parameters. Our main results provide asymptotically exact formulas for the mutual information (equivalently, the free energy) as well as the minimum mean-squared error in the Bayes-optimal setting. We then apply this framework to derive sharp characterizations of statistical thresholds for two novel scenarios: (1) tensor estimation in heteroskedastic noise that is independent but not identically distributed, and (2) higher-order assignment problems, where the goal is to recover an unknown permutation from tensor-valued observations.

  • Capacity on BMS Channels via Code Symmetry and Nesting

    ArXiv.org · 2025-04-21

    preprintOpen accessSenior author

    The past decade has seen notable advances in our understanding of structured error-correcting codes, particularly binary Reed--Muller (RM) codes. While initial breakthroughs were for erasure channels based on symmetry, extending these results to the binary symmetric channel (BSC) and other binary memoryless symmetric (BMS) channels required new tools and conditions. Recent work uses nesting to obtain multiple weakly correlated "looks" that imply capacity-achieving performance under bit-MAP and block-MAP decoding. This paper revisits and extends past approaches, aiming to simplify proofs, unify insights, and remove unnecessary conditions. By leveraging powerful results from the analysis of boolean functions, we derive recursive bounds using two or three looks at each stage. This gives bounds on the bit error probability that decay exponentially in the number of stages. For the BSC, we incorporate level-k inequalities and hypercontractive techniques to achieve the faster decay rate required for vanishing block error probability. The results are presented in a semitutorial style, providing both theoretical insights and practical implications for future research on structured codes.

  • Fundamental Limits for High-Dimensional Factor Regression Models

    2025-06-22

    articleSenior author
  • Information-Theoretic Proofs for Diffusion Sampling

    ArXiv.org · 2025-02-04

    preprintOpen access1st authorCorresponding

    This paper provides an elementary, self-contained analysis of diffusion-based sampling methods for generative modeling. In contrast to existing approaches that rely on continuous-time processes and then discretize, our treatment works directly with discrete-time stochastic processes and yields precise non-asymptotic convergence guarantees under broad assumptions. The key insight is to couple the sampling process of interest with an idealized comparison process that has an explicit Gaussian-convolution structure. We then leverage simple identities from information theory, including the I-MMSE relationship, to bound the discrepancy (in terms of the Kullback-Leibler divergence) between these two discrete-time processes. In particular, we show that, if the diffusion step sizes are chosen sufficiently small and one can approximate certain conditional mean estimators well, then the sampling distribution is provably close to the target distribution. Our results also provide a transparent view on how to accelerate convergence by using additional randomness in each step to match higher-order moments in the comparison process.

  • Linear Operator Approximate Message Passing (OpAMP)

    arXiv (Cornell University) · 2024-05-13 · 1 citations

    preprintOpen accessSenior author

    This paper introduces a framework for approximate message passing (AMP) in dynamic settings where the data at each iteration is passed through a linear operator. This framework is motivated in part by applications in large-scale, distributed computing where only a subset of the data is available at each iteration. An autoregressive memory term is used to mitigate information loss across iterations and a specialized algorithm, called projection AMP, is designed for the case where each linear operator is an orthogonal projection. Precise theoretical guarantees are provided for a class of Gaussian matrices and non-separable denoising functions. Specifically, it is shown that the iterates can be well-approximated in the high-dimensional limit by a Gaussian process whose second-order statistics are defined recursively via state evolution. These results are applied to the problem of estimating a rank-one spike corrupted by additive Gaussian noise using partial row updates, and the theory is validated by numerical simulations.

Recent grants

Frequent coauthors

Awards & honors

  • NSF CAREER award (2017)
  • Resume-aware match score
  • Save to shortlist
  • AI-drafted outreach

See your match with Galen Reeves

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

  • Free to start
  • No credit card
  • 30-second signup