Resume-aware faculty matching

Find professors who actually fit you

Upload your resume. Four AI agents analyze your background, rank the faculty who fit, inspect their recent research, and help you draft outreach — grounded in their actual work, not templates.

Free to startNo credit cardCancel anytime
Top matches Balanced preset
Dr. Sarah Chen
Stanford · Interpretability · NLP
91
Dr. Marcus Holloway
MIT · Robotics · RL
84
Dr. Aisha Okonkwo
CMU · Fairness · HCI
82
Nova · Professor Researcher · re-ranking top 20…
Sekhar Tatikonda

Sekhar Tatikonda

· Associate Professor of Statistics & Data ScienceVerified

Yale University · Department of Statistics and Data Science

Active 1999–2025

h-index25
Citations4.7k
Papers13713 last 5y
Funding$2.6M
See your match with Sekhar Tatikonda — sign in to PhdFit.Sign in

Research topics

  • Computer Science
  • Artificial Intelligence
  • Algorithm
  • Machine Learning
  • Mathematical optimization
  • Applied mathematics
  • Mathematics

Selected publications

  • Random projections beyond zero overlap

    Electronic Journal of Probability · 2025-01-01

    articleOpen accessSenior author

    Define the overlap of a random vector to be its inner product with an independent copy. A random vector whose Euclidean norm and overlap concentrates is shown to have random low-dimensional projections that are approximately random Gaussians. Conversely, asymptotically random Gaussian projections imply these hypotheses. This extends and unites several existing results in geometric functional analysis and spin glasses. Applications include a large-system characterization of the joint law of cavity fields in the Sherrington-Kirkpatrick model.

  • Random projections beyond zero overlap

    arXiv (Cornell University) · 2023-12-02

    preprintOpen accessSenior author

    A random vector whose norm and overlap (inner product with an independent copy) concentrates is shown to have random low-dimensional projections that are approximately random Gaussians. Conversely, asymptotically random Gaussian projections imply these hypotheses. This extends and unites several existing results in geometric functional analysis and spin glasses. Applications include a large-system characterization of the joint law of cavity fields in the Sherrington-Kirkpatrick model.

  • Local independence in mean-field spin glasses

    arXiv (Cornell University) · 2022-12-30

    preprintOpen accessSenior author

    We present a new approach to local independence in spin glasses, i.e. the phenomenon that any fixed subset of coordinates is asymptotically independent in the thermodynamic limit. The approach generalizes the rigorous cavity method from Talagrand by considering multiple cavity sites. Under replica-symmetric conditions of thin-shell and overlap concentration, the cavity fields are revealed to be asymptotically independent, conditionally on the disorder, which in turn leads to local independence. Conversely, it is shown that local independence implies those replica-symmetric properties. The framework is general enough to encompass the classical and soft spin ($[-1,1]$) Sherrington-Kirkpatrick models, as well as the Gardner spin glasses.

  • Surrogate Gap Minimization Improves Sharpness-Aware Training

    arXiv (Cornell University) · 2022-03-15 · 13 citations

    preprintOpen access

    The recently proposed Sharpness-Aware Minimization (SAM) improves generalization by minimizing a \textit{perturbed loss} defined as the maximum loss within a neighborhood in the parameter space. However, we show that both sharp and flat minima can have a low perturbed loss, implying that SAM does not always prefer flat minima. Instead, we define a \textit{surrogate gap}, a measure equivalent to the dominant eigenvalue of Hessian at a local minimum when the radius of the neighborhood (to derive the perturbed loss) is small. The surrogate gap is easy to compute and feasible for direct minimization during training. Based on the above observations, we propose Surrogate \textbf{G}ap Guided \textbf{S}harpness-\textbf{A}ware \textbf{M}inimization (GSAM), a novel improvement over SAM with negligible computation overhead. Conceptually, GSAM consists of two steps: 1) a gradient descent like SAM to minimize the perturbed loss, and 2) an \textit{ascent} step in the \textit{orthogonal} direction (after gradient decomposition) to minimize the surrogate gap and yet not affect the perturbed loss. GSAM seeks a region with both small loss (by step 1) and low sharpness (by step 2), giving rise to a model with high generalization capabilities. Theoretically, we show the convergence of GSAM and provably better generalization than SAM. Empirically, GSAM consistently improves generalization (e.g., +3.2\% over SAM and +5.4\% over AdamW on ImageNet top-1 accuracy for ViT-B/32). Code is released at \url{ https://sites.google.com/view/gsam-iclr22/home}.

  • Multiple-Shooting Adjoint Method for Whole-Brain Dynamic Causal Modeling

    Lecture notes in computer science · 2021-01-01 · 5 citations

    book-chapter
  • MALI: A memory efficient and reverse accurate integrator for Neural ODEs

    arXiv (Cornell University) · 2021-02-09 · 11 citations

    preprintOpen access

    Neural ordinary differential equations (Neural ODEs) are a new family of deep-learning models with continuous depth. However, the numerical estimation of the gradient in the continuous case is not well solved: existing implementations of the adjoint method suffer from inaccuracy in reverse-time trajectory, while the naive method and the adaptive checkpoint adjoint method (ACA) have a memory cost that grows with integration time. In this project, based on the asynchronous leapfrog (ALF) solver, we propose the Memory-efficient ALF Integrator (MALI), which has a constant memory cost \textit{w.r.t} number of solver steps in integration similar to the adjoint method, and guarantees accuracy in reverse-time trajectory (hence accuracy in gradient estimation). We validate MALI in various tasks: on image recognition tasks, to our knowledge, MALI is the first to enable feasible training of a Neural ODE on ImageNet and outperform a well-tuned ResNet, while existing methods fail due to either heavy memory burden or inaccuracy; for time series modeling, MALI significantly outperforms the adjoint method; and for continuous generative models, MALI achieves new state-of-the-art performance. We provide a pypi package at \url{https://jzkay12.github.io/TorchDiffEqPack/}

  • Momentum Centering and Asynchronous Update for Adaptive Gradient Methods

    arXiv (Cornell University) · 2021-10-11 · 1 citations

    preprintOpen access

    We propose ACProp (Asynchronous-centering-Prop), an adaptive optimizer which combines centering of second momentum and asynchronous update (e.g. for $t$-th update, denominator uses information up to step $t-1$, while numerator uses gradient at $t$-th step). ACProp has both strong theoretical properties and empirical performance. With the example by Reddi et al. (2018), we show that asynchronous optimizers (e.g. AdaShift, ACProp) have weaker convergence condition than synchronous optimizers (e.g. Adam, RMSProp, AdaBelief); within asynchronous optimizers, we show that centering of second momentum further weakens the convergence condition. We demonstrate that ACProp has a convergence rate of $O(\frac{1}{\sqrt{T}})$ for the stochastic non-convex case, which matches the oracle rate and outperforms the $O(\frac{logT}{\sqrt{T}})$ rate of RMSProp and Adam. We validate ACProp in extensive empirical studies: ACProp outperforms both SGD and other adaptive optimizers in image classification with CNN, and outperforms well-tuned adaptive optimizers in the training of various GAN models, reinforcement learning and transformers. To sum up, ACProp has good theoretical properties including weak convergence condition and optimal convergence rate, and strong empirical performance including good generalization like SGD and training stability like Adam. We provide the implementation at https://github.com/juntang-zhuang/ACProp-Optimizer.

  • MALI: A memory efficient and reverse accurate integrator for Neural ODEs

    arXiv (Cornell University) · 2021-05-03 · 7 citations

    articleOpen access

    Neural ordinary differential equations (Neural ODEs) are a new family of deep-learning models with continuous depth. However, the numerical estimation of the gradient in the continuous case is not well solved: existing implementations of the adjoint method suffer from inaccuracy in reverse-time trajectory, while the naive method and the adaptive checkpoint adjoint method (ACA) have a memory cost that grows with integration time. In this project, based on the asynchronous leapfrog (ALF) solver, we propose the Memory-efficient ALF Integrator (MALI), which has a constant memory cost w.r.t integration time similar to the adjoint method, and guarantees accuracy in reverse-time trajectory (hence accuracy in gradient estimation). We validate MALI in various tasks: on image recognition tasks, to our knowledge, MALI is the first to enable feasible training of a Neural ODE on ImageNet and outperform a well-tuned ResNet, while existing methods fail due to either heavy memory burden or inaccuracy; for time series modeling, MALI significantly outperforms the adjoint method; and for continuous generative models, MALI achieves new state-of-the-art performance. We provide a pypi package: https://jzkay12.github.io/TorchDiffEqPack

  • Multiple-shooting adjoint method for whole-brain dynamic causal modeling

    arXiv (Cornell University) · 2021-02-14 · 3 citations

    preprintOpen access

    Dynamic causal modeling (DCM) is a Bayesian framework to infer directed connections between compartments, and has been used to describe the interactions between underlying neural populations based on functional neuroimaging data. DCM is typically analyzed with the expectation-maximization (EM) algorithm. However, because the inversion of a large-scale continuous system is difficult when noisy observations are present, DCM by EM is typically limited to a small number of compartments ($<10$). Another drawback with the current method is its complexity; when the forward model changes, the posterior mean changes, and we need to re-derive the algorithm for optimization. In this project, we propose the Multiple-Shooting Adjoint (MSA) method to address these limitations. MSA uses the multiple-shooting method for parameter estimation in ordinary differential equations (ODEs) under noisy observations, and is suitable for large-scale systems such as whole-brain analysis in functional MRI (fMRI). Furthermore, MSA uses the adjoint method for accurate gradient estimation in the ODE; since the adjoint method is generic, MSA is a generic method for both linear and non-linear systems, and does not require re-derivation of the algorithm as in EM. We validate MSA in extensive experiments: 1) in toy examples with both linear and non-linear models, we show that MSA achieves better accuracy in parameter value estimation than EM; furthermore, MSA can be successfully applied to large systems with up to 100 compartments; and 2) using real fMRI data, we apply MSA to the estimation of the whole-brain effective connectome and show improved classification of autism spectrum disorder (ASD) vs. control compared to using the functional connectome. The package is provided \url{https://jzkay12.github.io/TorchDiffEqPack}

  • AdaBelief Optimizer: Adapting Stepsizes by the Belief in Observed\n Gradients

    arXiv (Cornell University) · 2020 · 220 citations

    • Computer Science
    • Computer Science
    • Artificial Intelligence

    Most popular optimizers for deep learning can be broadly categorized as\nadaptive methods (e.g. Adam) and accelerated schemes (e.g. stochastic gradient\ndescent (SGD) with momentum). For many models such as convolutional neural\nnetworks (CNNs), adaptive methods typically converge faster but generalize\nworse compared to SGD; for complex settings such as generative adversarial\nnetworks (GANs), adaptive methods are typically the default because of their\nstability.We propose AdaBelief to simultaneously achieve three goals: fast\nconvergence as in adaptive methods, good generalization as in SGD, and training\nstability. The intuition for AdaBelief is to adapt the stepsize according to\nthe "belief" in the current gradient direction. Viewing the exponential moving\naverage (EMA) of the noisy gradient as the prediction of the gradient at the\nnext time step, if the observed gradient greatly deviates from the prediction,\nwe distrust the current observation and take a small step; if the observed\ngradient is close to the prediction, we trust it and take a large step. We\nvalidate AdaBelief in extensive experiments, showing that it outperforms other\nmethods with fast convergence and high accuracy on image classification and\nlanguage modeling. Specifically, on ImageNet, AdaBelief achieves comparable\naccuracy to SGD. Furthermore, in the training of a GAN on Cifar10, AdaBelief\ndemonstrates high stability and improves the quality of generated samples\ncompared to a well-tuned Adam optimizer. Code is available at\nhttps://github.com/juntang-zhuang/Adabelief-Optimizer\n

Recent grants

Frequent coauthors

  • Ramji Venkataramanan

    University of Cambridge

    19 shared
  • Sanjoy K. Mitter

    Decision Systems (United States)

    13 shared
  • Aditya Mahajan

    12 shared
  • Nicha C. Dvornek

    11 shared
  • Juntang Zhuang

    University of New Haven

    11 shared
  • Jian Ni

    11 shared
  • Patrick Rebeschini

    11 shared
  • Nicholas Ruozzi

    9 shared
  • Resume-aware match score
  • Save to shortlist
  • AI-drafted outreach

See your match with Sekhar Tatikonda

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

  • Free to start
  • No credit card
  • 30-second signup