Resume-aware faculty matching

Find professors who actually fit you

Upload your resume. Four AI agents analyze your background, rank the faculty who fit, inspect their recent research, and help you draft outreach — grounded in their actual work, not templates.

Free to startNo credit cardCancel anytime
Top matches Balanced preset
Dr. Sarah Chen
Stanford · Interpretability · NLP
91
Dr. Marcus Holloway
MIT · Robotics · RL
84
Dr. Aisha Okonkwo
CMU · Fairness · HCI
82
Nova · Professor Researcher · re-ranking top 20…
Rayadurgam  Srikant

Rayadurgam Srikant

· Professor, Electrical and Computer EngineeringVerified

University of Illinois Urbana-Champaign · Computer Science

Active 1990–2025

h-index80
Citations26.2k
Papers52271 last 5y
Funding$4.5M1 active
See your match with Rayadurgam Srikant — sign in to PhdFit.Sign in

About

Rayadurgam Srikant is a Professor in the Department of Electrical and Computer Engineering and the Coordinated Science Laboratory at the University of Illinois Urbana-Champaign. He is also a Grainger Distinguished Chair in Engineering and one of the co-Directors of the C3.ai Digital Transformation Institute. His research interests include machine learning, applied probability, stochastic control, and communication networks. He has authored or co-authored several books on communication networks, network optimization, and internet congestion control. Dr. Srikant has received numerous awards, including the 2015 INFOCOM Achievement Award, the 2019 IEEE Koji Kobayashi Computers and Communications Award, and the 2021 ACM SIGMETRICS Achievement Award. He has served as Editor-in-Chief of the IEEE/ACM Transactions on Networking and is currently an Area Editor for the Mathematics of Operations Research. His contributions to the field are recognized through his leadership roles, editorial positions, and the success of his advisees who hold faculty positions or leadership roles in industry.

Research topics

  • Computer Science
  • Artificial Intelligence
  • Machine Learning
  • Computer Security
  • Data Mining
  • Economics
  • Econometrics
  • Computer network
  • Mathematical optimization
  • Distributed computing
  • Mathematical economics
  • Mathematics

Selected publications

  • On the Gaussian Limit of the Output of IIR Filters

    2025-12-09 · 1 citations

    articleSenior author

    We study the asymptotic distribution of the output of a stable Linear Time-Invariant (LTI) system driven by a non-Gaussian stochastic input. Motivated by longstanding heuristics in the stochastic describing function method, we rigorously characterize when the output process becomes approximately Gaussian, even when the input is not. Using the Wasserstein-1 distance as a quantitative measure of non-Gaussianity, we derive upper bounds on the distance between the appropriately scaled output and a standard normal distribution. These bounds are obtained via Stein’s method and depend explicitly on the system’s impulse response and the dependence structure of the input process. We show that when the dominant pole of the system approaches the edge of stability and the input satisfies one of the following conditions—(i) independence, (ii) positive correlation with a real and positive dominant pole, or (iii) sufficient correlation decay—the output converges to a standard normal distribution at rate $O\left( {1/\sqrt t } \right)$. We also present counterexamples where convergence fails, thereby motivating the stated assumptions. Our results provide a rigorous foundation for the widespread observation that outputs of low-pass LTI systems tend to be approximately Gaussian.

  • Convergence of Natural Policy Gradient for a family of infinite-state queueing MDPs

    Queueing Systems · 2025-08-07 · 2 citations

    articleOpen accessSenior author

    Abstract A wide variety of queueing systems can be naturally modeled as infinite-state Markov Decision Processes (MDPs). In the reinforcement learning (RL) context, a variety of algorithms have been developed to learn and optimize these MDPs. At the heart of many popular policy-gradient-based learning algorithms, such as natural actor-critic, TRPO, and PPO, lies the Natural Policy Gradient (NPG) policy optimization algorithm. Convergence results for these RL algorithms rest on convergence results for the NPG algorithm. However, all existing results on the convergence of the NPG algorithm are limited to finite-state settings. We study a general class of queueing MDPs and prove a $$O(1/\sqrt{T})$$ <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"> <mml:mrow> <mml:mi>O</mml:mi> <mml:mo>(</mml:mo> <mml:mn>1</mml:mn> <mml:mo>/</mml:mo> <mml:msqrt> <mml:mi>T</mml:mi> </mml:msqrt> <mml:mo>)</mml:mo> </mml:mrow> </mml:math> convergence rate for the NPG algorithm, if the NPG algorithm is initialized with the MaxWeight policy. This is the first convergence rate bound for the NPG algorithm for a general class of infinite-state average-reward MDPs. Moreover, our result applies to a beyond the queueing setting to any countably infinite MDP satisfying certain mild structural assumptions, given a sufficiently good initial policy. Key to our result are state-dependent bounds on the relative value function achieved by the iterate policies of the NPG algorithm.

  • Joint Optimal Transport and Embedding for Network Alignment

    2025-04-22 · 8 citations

    articleOpen access

    Network alignment, which aims to find node correspondence across different networks, is the cornerstone of various downstream multi-network and Web mining tasks. Most of the embedding-based methods indirectly model cross-network node relationships by contrasting positive and negative node pairs sampled from hand-crafted strategies, which are vulnerable to graph noises and lead to potential misalignment of nodes. Another line of work based on the optimal transport (OT) theory directly models cross-network node relationships and generates noise-reduced alignments. However, OT methods heavily rely on fixed, pre-defined cost functions that prohibit end-to-end training and are hard to generalize. In this paper, we aim to unify the embedding and OT-based methods in a mutually beneficial manner and propose a joint optimal transport and embedding framework for network alignment named JOENA. For one thing (OT for embedding), through a simple yet effective transformation, the noise-reduced OT mapping serves as an adaptive sampling strategy directly modeling all cross-network node pairs for robust embedding learning. For another (embedding for OT), on top of the learned embeddings, the OT cost can be gradually trained in an end-to-end fashion, which further enhances the alignment quality. With a unified objective, the mutual benefits of both methods can be achieved by an alternating optimization schema with guaranteed convergence. Extensive experiments on real-world networks validate the effectiveness and scalability of JOENA, achieving up to 16% improvement in MRR and 20 times speedup compared with the state-of-the-art alignment methods.

  • On the Convergence of Modified Policy Iteration in Risk-Sensitive Exponential Cost Markov Decision Processes

    Operations Research · 2025-11-27

    articleSenior author

    Balancing Risk and Robustness in Dynamic Decision Making Many real systems, such as networks, finance, and safety-critical autonomy, must hedge against rare but costly events. Risk-sensitive control formalizes this idea by optimizing an exponential cost objective that prioritizes reliability over just average performance. Classical dynamic programming methods such as value iteration and policy iteration are well-understood in this risk-sensitive setting. However, modified policy iteration (MPI), which combines the strengths of both through partial policy evaluation, has lacked any theoretical understanding. This paper addresses this gap. It analyzes MPI for risk-sensitive Markov decision processes governed by a multiplicative Bellman equation, develops normalization and contraction tools suited to this setting, and proves both convergence and finite-time guarantees. The results provide a principled foundation for algorithms that combine computational efficiency with robustness, supporting the development of reinforcement learning methods that emphasize long-term reliability.

  • Provably Convergent Primal-Dual DPO for Constrained LLM Alignment

    ArXiv.org · 2025-10-07

    preprintOpen accessSenior author

    The widespread application of large language models (LLMs) raises increasing demands on ensuring safety or imposing constraints, such as reducing harmful content and adhering to predefined rules. While there have been several works studying LLM safety alignment, these works either need to train three models and incur high memory costs, or require prior knowledge on the optimal solution. Witnessing this fact, we investigate the constrained alignment problem for LLMs, i.e., maximizing the reward of outputs while restricting the cost to stay below a threshold. We propose a novel primal-dual direct preference optimization (DPO) approach, which first trains a model using standard DPO on reward preference data to provide reward information, and then adopts a rearranged Lagrangian DPO objective utilizing the provided reward information to fine-tune LLMs. Our approach only needs to train two models rather than three, which significantly saves memory costs, and does not require extra prior knowledge. Moreover, we establish rigorous suboptimality and constraint violation guarantees. We also extend our approach to enable online exploration and drop the data coverage dependence in the results. Experiments on the PKU-SafeRLHF and TruthfulQA datasets demonstrate the state-of-the-art performance of our approach.

  • Decentralized and Uncoordinated Learning of Stable Matchings: A Game-Theoretic Approach

    Proceedings of the AAAI Conference on Artificial Intelligence · 2025-04-11 · 1 citations

    articleOpen accessSenior author

    We consider the problem of learning stable matchings with unknown preferences in a decentralized and uncoordinated manner, where ``decentralized" means that players make decisions individually without the influence of a central platform, and ``uncoordinated" means that players do not need to synchronize their decisions using pre-specified rules. First, we provide a game formulation for this problem with known preferences, where the set of pure Nash equilibria (NE) coincides with the set of stable matchings, and mixed NE can be rounded to a stable matching. Then, we show that for hierarchical markets, applying the exponential weight (EXP) learning algorithm to the stable matching game achieves logarithmic regret in a fully decentralized and uncoordinated fashion. Moreover, we show that EXP converges locally and exponentially fast to a stable matching in general matching markets. We complement our results by introducing another decentralized and uncoordinated learning algorithm that globally converges to a stable matching with arbitrarily high probability.

  • Joint Optimal Transport and Embedding for Network Alignment

    ArXiv.org · 2025-02-26

    preprintOpen access

    Network alignment, which aims to find node correspondence across different networks, is the cornerstone of various downstream multi-network and Web mining tasks. Most of the embedding-based methods indirectly model cross-network node relationships by contrasting positive and negative node pairs sampled from hand-crafted strategies, which are vulnerable to graph noises and lead to potential misalignment of nodes. Another line of work based on the optimal transport (OT) theory directly models cross-network node relationships and generates noise-reduced alignments. However, OT methods heavily rely on fixed, pre-defined cost functions that prohibit end-to-end training and are hard to generalize. In this paper, we aim to unify the embedding and OT-based methods in a mutually beneficial manner and propose a joint optimal transport and embedding framework for network alignment named JOENA. For one thing (OT for embedding), through a simple yet effective transformation, the noise-reduced OT mapping serves as an adaptive sampling strategy directly modeling all cross-network node pairs for robust embedding learning.For another (embedding for OT), on top of the learned embeddings, the OT cost can be gradually trained in an end-to-end fashion, which further enhances the alignment quality. With a unified objective, the mutual benefits of both methods can be achieved by an alternating optimization schema with guaranteed convergence. Extensive experiments on real-world networks validate the effectiveness and scalability of JOENA, achieving up to 16% improvement in MRR and 20x speedup compared with the state-of-the-art alignment methods.

  • Rates of Convergence in the Central Limit Theorem for Markov Chains, with an Application to TD Learning

    Mathematics of Operations Research · 2025-10-03

    article1st authorCorresponding

    We prove a nonasymptotic central limit theorem (CLT) for vector-valued martingale differences using Stein’s method, and we use Poisson’s equation to extend the result to functions of Markov chains. We then show that these results can be applied to establish a nonasymptotic CLT for temporal difference learning with averaging. Funding: This work was supported by National Science Foundation [Grants CNS 23-12714, CCF 22-07547, and CNS 21-06801] and Air Force Office of Scientific Research [Grant FA9550-24-1-0002].

  • Scalable Policy-Based RL Algorithms for POMDPs

    ArXiv.org · 2025-10-08

    preprintOpen accessSenior author

    The continuous nature of belief states in POMDPs presents significant computational challenges in learning the optimal policy. In this paper, we consider an approach that solves a Partially Observable Reinforcement Learning (PORL) problem by approximating the corresponding POMDP model into a finite-state Markov Decision Process (MDP) (called Superstate MDP). We first derive theoretical guarantees that improve upon prior work that relate the optimal value function of the transformed Superstate MDP to the optimal value function of the original POMDP. Next, we propose a policy-based learning approach with linear function approximation to learn the optimal policy for the Superstate MDP. Consequently, our approach shows that a POMDP can be approximately solved using TD-learning followed by Policy Optimization by treating it as an MDP, where the MDP state corresponds to a finite history. We show that the approximation error decreases exponentially with the length of this history. To the best of our knowledge, our finite-time bounds are the first to explicitly quantify the error introduced when applying standard TD learning to a setting where the true dynamics are not Markovian.

  • Optimal Hybrid Feedback-Driven Learning for Wireless Interactive Panoramic Scene Delivery

    2025-10-23

    articleOpen accessSenior author

    Immersive technologies, such as virtual and augmented reality, demand high framerate, low latency, and precise synchronization between real and virtual environments. To meet these requirements, an edge server typically needs to perform high-quality rendering, and must predict user head motion and transmit a portion of the rendered panoramic scene that is large enough to cover the user's viewport, yet small enough to satisfy bandwidth constraints. Each portion yields two feedback signals: prediction feedback, indicating whether the selected portion covers the actual viewport, and transmission feedback, indicating whether all data packets are successfully delivered. While prior work models this setting as a multi-armed bandit with two-level bandit feedback, it overlooks that prediction feedback can be retrospectively computed for all possible portions, thus providing full-information feedback. In this work, we introduce a new two-level feedback model that combines full-information feedback with bandit feedback, and we formulate the portion selection problem as an online learning task under this hybrid setting. We derive an instance-dependent regret lower bound for this new hybrid feedback setting, and we propose AdaPort, a hybrid learning algorithm that leverages both the full-information feedback and bandit feedback to improve learning efficiency. We then show that the instance-dependent regret upper bound for AdaPort matches the lower bound asymptotically, proving its asymptotic optimality. Simulations using synthetic data and real-world traces demonstrate that AdaPort consistently outperforms state-of-the-art baselines, validating the benefits of exploiting the hybrid feedback structure.

Recent grants

Frequent coauthors

Education

  • Ph.D., Computer Science

    University of California, Berkeley

    1990
  • M.S., Computer Science

    University of California, Berkeley

    1986
  • B.S., Electrical Engineering

    Indian Institute of Technology, Madras

    1982

Awards & honors

  • 2015 INFOCOM Achievement Award
  • 2019 IEEE Koji Kobayashi Computers and Communications Award
  • 2021 ACM SIGMETRICS Achievement Award
  • Best Paper Award at INFOCOM (2015)
  • Best Publication Award from Applied Probability Society (201…
  • Resume-aware match score
  • Save to shortlist
  • AI-drafted outreach

See your match with Rayadurgam Srikant

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

  • Free to start
  • No credit card
  • 30-second signup