Galen Reeves

Verified

Duke University · Electrical and Computer Engineering

Active 2007–2026

h-index20

Citations1.4k

Papers10928 last 5y

Funding$490k

Faculty page Lab page

See your match with Galen Reeves — sign in to PhdFit.Sign in

About

Galen Reeves is an Associate Professor at Duke University with a joint appointment in the Department of Electrical and Computer Engineering and the Department of Statistical Science. He joined the faculty at Duke in Fall 2013. He completed his PhD in Electrical Engineering and Computer Sciences at the University of California, Berkeley in 2011, followed by a postdoctoral associate position in the Departments of Statistics at Stanford University from 2011 to 2013. His research interests encompass information theory, statistics, machine learning, and signal processing, with a particular focus on fundamental questions regarding the amount of data needed for inference or learning tasks, the representation of uncertainty in high-dimensional and complex spaces, and the gap between practical and combinatorial methods. His research approach is highly interdisciplinary, drawing on ideas from engineering, statistics, mathematics, theoretical computer science, and statistical physics. Reeves has been recognized with the NSF CAREER award in 2017 and is a corecipient of the 2025 Information Theory Society Paper Award.

Research topics

Computer Science
Machine Learning
Artificial Intelligence
Mathematics
Statistics
Discrete mathematics
Telecommunications
Arithmetic
Algorithm

Selected publications

Reed--Muller Codes Achieve the Symmetric Capacity on Finite-State Channels
arXiv (Cornell University) · 2026-04-16
preprintOpen accessSenior author
We study reliable communication over finite-state channels (FSCs) using Reed--Muller (RM) codes. Building on recent symmetry-based analyses for memoryless channels, we show that a sequence of binary RM codes (with some random scrambling) can achieve the symmetric capacity (or uniform-input information rate) of a binary-input indecomposable FSC. Our approach has three components. First, we establish a capacity-via-symmetry theorem for doubly-transitive group codes on discrete memoryless channels (DMCs) with non-binary inputs, under some symmetry and puncturing conditions. Then, we reduce a binary-input FSC to an almost memoryless non-binary channel by grouping adjacent input bits into blocks and interleaving non-binary codes onto the channel. Finally, we show that the interleaved non-binary codes can be constructed from a single binary RM code.
Publisher DOI
Reed--Muller Codes Achieve the Symmetric Capacity on Finite-State Channels
ArXiv.org · 2026-04-16
articleOpen accessSenior author
We study reliable communication over finite-state channels (FSCs) using Reed--Muller (RM) codes. Building on recent symmetry-based analyses for memoryless channels, we show that a sequence of binary RM codes (with some random scrambling) can achieve the symmetric capacity (or uniform-input information rate) of a binary-input indecomposable FSC. Our approach has three components. First, we establish a capacity-via-symmetry theorem for doubly-transitive group codes on discrete memoryless channels (DMCs) with non-binary inputs, under some symmetry and puncturing conditions. Then, we reduce a binary-input FSC to an almost memoryless non-binary channel by grouping adjacent input bits into blocks and interleaving non-binary codes onto the channel. Finally, we show that the interleaved non-binary codes can be constructed from a single binary RM code.
Publisher OA PDF
Linear operator approximate message passing (OpAMP)
Information and Inference A Journal of the IMA · 2025-10-06
articleSenior author
Abstract This paper introduces a framework for approximate message passing (AMP) in dynamic settings where the data at each iteration is passed through a linear operator. This framework is motivated in part by applications in large-scale, distributed computing where only a subset of the data is available at each iteration. An autoregressive memory term is used to mitigate information loss across iterations and a specialized algorithm, called projection AMP, is designed for the case where each linear operator is an orthogonal projection. Precise theoretical guarantees are provided for a class of Gaussian matrices and non-separable denoising functions. Specifically, it is shown that the iterates can be well approximated in the high-dimensional limit by a Gaussian process whose second-order statistics are defined recursively via state evolution. These results are applied to the problem of estimating a rank-one spike corrupted by additive Gaussian noise using partial row updates, and the theory is validated by numerical simulations.
Publisher DOI
Information-Theoretic Proofs for Diffusion Sampling
2025-06-22 · 1 citations
article1st authorCorresponding
This paper provides an elementary, self-contained analysis of diffusion-based sampling methods for generative modeling. In contrast to existing approaches that rely on continuous-time processes and then discretize, our treatment works directly with discrete-time stochastic processes and yields precise non-asymptotic convergence guarantees under broad assumptions. The key insight is to couple the sampling process of interest with an idealized comparison process that has an explicit Gaussian-convolution structure. We then leverage simple identities from information theory, including the I- MMSE relationship, to bound the discrepancy (in terms of the Kullback-Leibler divergence) between these two discrete-time processes. In particular, we show that, if the diffusion step sizes are chosen sufficiently small and one can approximate certain conditional mean estimators well, then the sampling distribution is provably close to the target distribution. Our results also provide a transparent view on how to accelerate convergence by using additional randomness in each step to match higher-order moments in the comparison process.
Publisher DOI
What happens when generative AI models train recursively on each others' outputs?
ArXiv.org · 2025-05-27 · 1 citations
preprintOpen access
The internet serves as a common source of training data for generative AI (genAI) models but is increasingly populated with AI-generated content. This duality raises the possibility that future genAI models may be trained on other models' generated outputs. Prior work has studied consequences of models training on their own generated outputs, but limited work has considered what happens if models ingest content produced by other models. Given society's increasing dependence on genAI tools, understanding such data-mediated model interactions is critical. This work provides empirical evidence for how data-mediated interactions might unfold in practice, develops a theoretical model for this interactive training process, and experimentally validates the theory. We find that data-mediated interactions can benefit models by exposing them to novel concepts perhaps missed in original training data, but also can homogenize their performance on shared tasks.
Publisher OA PDF DOI
Statistical Limits for Finite-Rank Tensor Estimation
ArXiv.org · 2025-06-07
preprintOpen accessSenior author
This paper provides a unified framework for analyzing tensor estimation problems that allow for nonlinear observations, heteroskedastic noise, and covariate information. We study a general class of high-dimensional models where each observation depends on the interactions among a finite number of unknown parameters. Our main results provide asymptotically exact formulas for the mutual information (equivalently, the free energy) as well as the minimum mean-squared error in the Bayes-optimal setting. We then apply this framework to derive sharp characterizations of statistical thresholds for two novel scenarios: (1) tensor estimation in heteroskedastic noise that is independent but not identically distributed, and (2) higher-order assignment problems, where the goal is to recover an unknown permutation from tensor-valued observations.
Publisher OA PDF DOI
Capacity on BMS Channels via Code Symmetry and Nesting
ArXiv.org · 2025-04-21
preprintOpen accessSenior author
The past decade has seen notable advances in our understanding of structured error-correcting codes, particularly binary Reed--Muller (RM) codes. While initial breakthroughs were for erasure channels based on symmetry, extending these results to the binary symmetric channel (BSC) and other binary memoryless symmetric (BMS) channels required new tools and conditions. Recent work uses nesting to obtain multiple weakly correlated "looks" that imply capacity-achieving performance under bit-MAP and block-MAP decoding. This paper revisits and extends past approaches, aiming to simplify proofs, unify insights, and remove unnecessary conditions. By leveraging powerful results from the analysis of boolean functions, we derive recursive bounds using two or three looks at each stage. This gives bounds on the bit error probability that decay exponentially in the number of stages. For the BSC, we incorporate level-k inequalities and hypercontractive techniques to achieve the faster decay rate required for vanishing block error probability. The results are presented in a semitutorial style, providing both theoretical insights and practical implications for future research on structured codes.
Publisher OA PDF DOI
Fundamental Limits for High-Dimensional Factor Regression Models
2025-06-22
articleSenior author
Publisher DOI
Information-Theoretic Proofs for Diffusion Sampling
ArXiv.org · 2025-02-04
preprintOpen access1st authorCorresponding
This paper provides an elementary, self-contained analysis of diffusion-based sampling methods for generative modeling. In contrast to existing approaches that rely on continuous-time processes and then discretize, our treatment works directly with discrete-time stochastic processes and yields precise non-asymptotic convergence guarantees under broad assumptions. The key insight is to couple the sampling process of interest with an idealized comparison process that has an explicit Gaussian-convolution structure. We then leverage simple identities from information theory, including the I-MMSE relationship, to bound the discrepancy (in terms of the Kullback-Leibler divergence) between these two discrete-time processes. In particular, we show that, if the diffusion step sizes are chosen sufficiently small and one can approximate certain conditional mean estimators well, then the sampling distribution is provably close to the target distribution. Our results also provide a transparent view on how to accelerate convergence by using additional randomness in each step to match higher-order moments in the comparison process.
Publisher OA PDF DOI
Linear Operator Approximate Message Passing (OpAMP)
arXiv (Cornell University) · 2024-05-13 · 1 citations
preprintOpen accessSenior author
This paper introduces a framework for approximate message passing (AMP) in dynamic settings where the data at each iteration is passed through a linear operator. This framework is motivated in part by applications in large-scale, distributed computing where only a subset of the data is available at each iteration. An autoregressive memory term is used to mitigate information loss across iterations and a specialized algorithm, called projection AMP, is designed for the case where each linear operator is an orthogonal projection. Precise theoretical guarantees are provided for a class of Gaussian matrices and non-separable denoising functions. Specifically, it is shown that the iterates can be well-approximated in the high-dimensional limit by a Gaussian process whose second-order statistics are defined recursively via state evolution. These results are applied to the problem of estimating a rank-one spike corrupted by additive Gaussian noise using partial row updates, and the theory is validated by numerical simulations.
Publisher OA PDF DOI

Recent grants

CAREER: Theoretical Foundations for Probabilistic Models with Dense Random Matrices
NSF · $490k · 2018–2025

Frequent coauthors

Michael Gastpar
27 shared
Vaishakhi Mayya
Duke University
16 shared
Henry D. Pfister
11 shared
Ilias Zadik
New York University
7 shared
Jiaming Xu
7 shared
Alexander Volfovsky
Duke University
6 shared
Boyla O. Mainsah
Duke University
6 shared
Willem van den Boom
Agency for Science, Technology and Research
6 shared

Awards & honors

NSF CAREER award (2017)

Resume-aware match score
Save to shortlist
AI-drafted outreach

See your match with Galen Reeves

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

Join the waitlist How it works

Free to start
No credit card
30-second signup

Find professors who actually fit you