
Christina Lee Yu
VerifiedCornell University · Operations Research and Information Engineering
Active 2014–2026
About
Christina Lee Yu is an Assistant Professor at Cornell University in the School of Operations Research and Information Engineering. Her research interests include algorithm design and analysis, high dimensional statistics, inference over networks, sequential decision making under uncertainty, online learning, and network causal inference. Prior to her position at Cornell, she was a postdoctoral researcher at Microsoft Research New England. She earned her PhD in 2017 and MS in 2013 in Electrical Engineering and Computer Science from the Massachusetts Institute of Technology, and her BS in Computer Science from the California Institute of Technology in 2011. Her work has been recognized with awards such as the 2021 Intel Rising Stars Award and a JPMorgan Faculty Research Award, and she has received support from grants from the National Science Foundation and the Air Force Office of Scientific Research.
Research topics
- Computer Science
- Artificial Intelligence
- Mathematics
- Mathematical optimization
- Economics
- Machine Learning
- Operations research
- Computer network
- Psychology
- Algorithm
- Mathematical economics
- Microeconomics
- Social psychology
Selected publications
Optimal Design under Interference, Homophily, and Robustness Trade-offs
Open MIND · 2026-01-23
preprintTo minimize the mean squared error (MSE) in global average treatment effect (GATE) estimation under network interference, a popular approach is to use a cluster-randomized design. However, in the presence of homophily, which is common in social networks, cluster randomization can instead increase the MSE. We develop a novel potential outcomes model that accounts for interference, homophily, and heterogeneous variation. In this setting, we establish a framework for optimizing designs for worst-case MSE under the Horvitz-Thompson estimator. This leads to an optimization problem over the covariance matrices of the treatment assignment, trading off interference, homophily, and robustness. We frame and solve this problem using two complementary approaches. The first involves formulating a semidefinite program (SDP) and employing Gaussian rounding, in the spirit of the Goemans-Williamson approximation algorithm for MAXCUT. The second is an adaptation of the Gram-Schmidt Walk, a vector-balancing algorithm which has recently received much attention. Finally, we evaluate the performance of our designs through various experiments on simulated network data and a real village network dataset.
Optimal Design under Interference, Homophily, and Robustness Trade-offs
ArXiv.org · 2026-01-23
articleOpen accessTo minimize the mean squared error (MSE) in global average treatment effect (GATE) estimation under network interference, a popular approach is to use a cluster-randomized design. However, in the presence of homophily, which is common in social networks, cluster randomization can instead increase the MSE. We develop a novel potential outcomes model that accounts for interference, homophily, and heterogeneous variation. In this setting, we establish a framework for optimizing designs for worst-case MSE under the Horvitz-Thompson estimator. This leads to an optimization problem over the covariance matrices of the treatment assignment, trading off interference, homophily, and robustness. We frame and solve this problem using two complementary approaches. The first involves formulating a semidefinite program (SDP) and employing Gaussian rounding, in the spirit of the Goemans-Williamson approximation algorithm for MAXCUT. The second is an adaptation of the Gram-Schmidt Walk, a vector-balancing algorithm which has recently received much attention. Finally, we evaluate the performance of our designs through various experiments on simulated network data and a real village network dataset.
The asymptotics of the expected Betti numbers of preferential attachment clique complexes
Advances in Applied Probability · 2025-03-10 · 4 citations
articleOpen accessCorrespondingAbstract The preferential attachment model is a natural and popular random graph model for a growing network that contains very well-connected ‘hubs’. We study the higher-order connectivity of such a network by investigating the topological properties of its clique complex. We concentrate on the Betti numbers, a sequence of topological invariants of the complex related to the numbers of holes (equivalently, repeated connections) of different dimensions. We prove that the expected Betti numbers grow sublinearly fast, with the trivial exceptions of those at dimensions 0 and 1. Our result also shows that preferential attachment graphs undergo infinitely many phase transitions within the parameter regime where the limiting degree distribution has an infinite variance. Regarding higher-order connectivity, our result shows that preferential attachment favors higher-order connectivity. We illustrate our theoretical results with simulations.
Experimentation Under Non-stationary Interference
ArXiv.org · 2025-11-10
preprintOpen accessSenior authorWe study the estimation of the ATE in randomized controlled trials under a dynamically evolving interference structure. This setting arises in applications such as ride-sharing, where drivers move over time, and social networks, where connections continuously form and dissolve. In particular, we focus on scenarios where outcomes exhibit spatio-temporal interference driven by a sequence of random interference graphs that evolve independently of the treatment assignment. Loosely, our main result states that a truncated Horvitz-Thompson estimator achieves an MSE that vanishes linearly in the number of spatial and time blocks, times a factor that measures the average complexity of the interference graphs. As a key technical contribution that contrasts the static setting we present a fine-grained covariance bound for each pair of space-time points that decays exponentially with the time elapsed since their last ``interaction''. Our results can be applied to many concrete settings and lead to simplified bounds, including where the interference graphs (i) are induced by moving points in a metric space, or (ii) follow a dynamic Erdos-Renyi model, where each edge is created or removed independently in each time period.
The SMART approach to instance-optimal online learning
arXiv (Cornell University) · 2024-02-27
preprintOpen accessSenior authorWe devise an online learning algorithm -- titled Switching via Monotone Adapted Regret Traces (SMART) -- that adapts to the data and achieves regret that is instance optimal, i.e., simultaneously competitive on every input sequence compared to the performance of the follow-the-leader (FTL) policy and the worst case guarantee of any other input policy. We show that the regret of the SMART policy on any input sequence is within a multiplicative factor $e/(e-1) \approx 1.58$ of the smaller of: 1) the regret obtained by FTL on the sequence, and 2) the upper bound on regret guaranteed by the given worst-case policy. This implies a strictly stronger guarantee than typical `best-of-both-worlds' bounds as the guarantee holds for every input sequence regardless of how it is generated. SMART is simple to implement as it begins by playing FTL and switches at most once during the time horizon to the worst-case algorithm. Our approach and results follow from an operational reduction of instance optimal online learning to competitive analysis for the ski-rental problem. We complement our competitive ratio upper bounds with a fundamental lower bound showing that over all input sequences, no algorithm can get better than a $1.43$-fraction of the minimum regret achieved by FTL and the minimax-optimal policy. We also present a modification of SMART that combines FTL with a ``small-loss" algorithm to achieve instance optimality between the regret of FTL and the small loss regret bound.
The Limits of Transfer Reinforcement Learning with Latent Low-rank Structure
arXiv (Cornell University) · 2024-10-28
preprintOpen accessSenior authorMany reinforcement learning (RL) algorithms are too costly to use in practice due to the large sizes $S, A$ of the problem's state and action space. To resolve this issue, we study transfer RL with latent low rank structure. We consider the problem of transferring a latent low rank representation when the source and target MDPs have transition kernels with Tucker rank $(S , d, A )$, $(S , S , d), (d, S, A )$, or $(d , d , d )$. In each setting, we introduce the transfer-ability coefficient $α$ that measures the difficulty of representational transfer. Our algorithm learns latent representations in each source MDP and then exploits the linear structure to remove the dependence on $S, A $, or $S A$ in the target MDP regret bound. We complement our positive results with information theoretic lower bounds that show our algorithms (excluding the ($d, d, d$) setting) are minimax-optimal with respect to $α$.
arXiv (Cornell University) · 2024-05-08 · 1 citations
preprintOpen accessSenior authorEstimating causal effects under interference is pertinent to many real-world settings. Recent work with low-order potential outcomes models uses a rollout design to obtain unbiased estimators that require no interference network information. However, the required extrapolation can lead to prohibitively high variance. To address this, we propose a two-stage experiment that selects a sub-population in the first stage and restricts treatment rollout to this sub-population in the second stage. We explore the role of clustering in the first stage by analyzing the bias and variance of a polynomial interpolation-style estimator under this experimental design. Bias increases with the number of edges cut in the clustering of the interference network, but variance depends on qualities of the clustering that relate to homophily and covariate balance. There is a tension between clustering objectives that minimize the number of cut edges versus those that maximize covariate balance across clusters. Through simulations, we explore a bias-variance trade-off and compare the performance of the estimator under different clustering strategies.
arXiv (Cornell University) · 2024-05-13 · 2 citations
preprintOpen accessSenior authorVariance reduction for causal inference in the presence of network interference is often achieved through either outcome modeling, typically analyzed under unit-randomized Bernoulli designs, or clustered experimental designs, typically analyzed without strong parametric assumptions. In this work, we study the intersection of these two approaches and make the following threefold contributions. First, we present an estimator of the total treatment effect (or global average treatment effect) in low-order outcome models when the data are collected under general experimental designs, generalizing previous results for Bernoulli designs. We refer to this estimator as the pseudoinverse estimator and give bounds on its bias and variance in terms of properties of the experimental design. Second, we evaluate these bounds for the case of Bernoulli graph cluster randomized (GCR) designs. Its variance scales like the smaller of the variance obtained by the estimator derived under a low-order assumption, and the variance obtained from cluster randomization, showing that combining these variance reduction strategies is preferable to using either individually. When the order of the potential outcomes model is correctly specified, our estimator is always unbiased, and under a misspecified model, we upper bound the bias by the closeness of the ground truth model to a low-order model. Third, we give empirical evidence that our variance bounds can be used to select a good clustering that minimizes the worst-case variance under a cluster randomized design from a set of candidate clusterings. Across a range of graphs and clustering algorithms, our method consistently selects clusterings that perform well on a range of response models, suggesting the practical use of our bounds.
Entry-Specific Matrix Estimation under Arbitrary Sampling Patterns through the Lens of Network Flows
arXiv (Cornell University) · 2024-09-06
preprintOpen accessSenior authorMatrix completion tackles the task of predicting missing values in a low-rank matrix based on a sparse set of observed entries. It is often assumed that the observation pattern is generated uniformly at random or has a very specific structure tuned to a given algorithm. There is still a gap in our understanding when it comes to arbitrary sampling patterns. Given an arbitrary sampling pattern, we introduce a matrix completion algorithm based on network flows in the bipartite graph induced by the observation pattern. For additive matrices, the particular flow we used is the electrical flow and we establish error upper bounds customized to each entry as a function of the observation set, along with matching minimax lower bounds. Our results show that the minimax squared error for recovery of a particular entry in the matrix is proportional to the effective resistance of the corresponding edge in the graph. Furthermore, we show that our estimator is equivalent to the least squares estimator. We apply our estimator to the two-way fixed effects model and show that it enables us to accurately infer individual causal effects and the unit-specific and time-specific confounders. For rank-$1$ matrices, we use edge-disjoint paths to form an estimator that achieves minimax optimal estimation when the sampling is sufficiently dense. Our discovery introduces a new family of estimators parametrized by network flows, which provide a fine-grained and intuitive understanding of the impact of the given sampling pattern on the relative difficulty of estimation at an entry-specific level. This graph-based approach allows us to quantify the inherent complexity of matrix completion for individual entries, rather than relying solely on global measures of performance.
Hierarchical Generalization Bounds for Deep Neural Networks
2024-07-07 · 1 citations
articleDeep neural networks (DNNs) exhibit an exceptional generalization capability in practice. This work aims to capture the effect of depth and its potential benefit for learning within the paradigm of information-theoretic generalization bounds. We derive two novel hierarchical bounds on the generalization error that explicitly depend on the internal representations within each layer. The first result, is a layer-dependent generalization bound in terms of the Kullback-Leibler (KL) divergence, which shrinks as the layer index increases. The second bound, which is based on the Wasserstein distance, implies the existence of a layer that serves as a generalization funnel, which minimizes the generalization bound. We then specialize our bounds to the case of binary Gaussian classification, and present analytic expressions dependent on weight matrices rank or certain norms, for the KL divergence and the Wasserstein bounds, respectively. Our results may provide a new perspective for understanding generalization in deep models.
Recent grants
CRII: CIF: Generalizations for Matrix and Tensor Estimation
NSF · $175k · 2020–2023
CNS Core: Medium: Resource Constrained Reinforcement Learning for Computing Systems
NSF · $1.2M · 2020–2026
Frequent coauthors
- 17 shared
Siddhartha Banerjee
Cornell University
- 16 shared
Sean R. Sinclair
- 14 shared
Devavrat Shah
- 9 shared
Yihua Li
- 7 shared
Yudong Chen
- 6 shared
Gauri Jain
Cornell University
- 6 shared
Tyler Sam
- 6 shared
Xumei Xi
Cornell University
Education
- 2017
PhD, Electrical Engineering and Computer Science
Massachusetts Institute of Technology
- 2013
MS, Electrical Engineering and Computer Science
Massachusetts Institute of Technology
- 2011
B.S., Computer Science
California Institute of Technology
Awards & honors
- Honorable Mention for the 2018 INFORMS Dantzig Dissertation…
- 2021 Intel Rising Stars Award
- JPMorgan Faculty Research Award (2021)
- NSF CAREER Award (2024)
- Simons Institute Research Fellow for Theory of Reinforcement…
- Resume-aware match score
- Save to shortlist
- AI-drafted outreach
See your match with Christina Lee Yu
PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.
- Free to start
- No credit card
- 30-second signup