Rayadurgam Srikant

· Professor, Electrical and Computer EngineeringVerified

University of Illinois Urbana-Champaign · Computer Science

Active 1990–2025

h-index80

Citations26.2k

Papers52271 last 5y

Funding$4.5M1 active

Faculty page

See your match with Rayadurgam Srikant — sign in to PhdFit.Sign in

About

Rayadurgam Srikant is a Professor in the Department of Electrical and Computer Engineering and the Coordinated Science Laboratory at the University of Illinois Urbana-Champaign. He is also a Grainger Distinguished Chair in Engineering and one of the co-Directors of the C3.ai Digital Transformation Institute. His research interests include machine learning, applied probability, stochastic control, and communication networks. He has authored or co-authored several books on communication networks, network optimization, and internet congestion control. Dr. Srikant has received numerous awards, including the 2015 INFOCOM Achievement Award, the 2019 IEEE Koji Kobayashi Computers and Communications Award, and the 2021 ACM SIGMETRICS Achievement Award. He has served as Editor-in-Chief of the IEEE/ACM Transactions on Networking and is currently an Area Editor for the Mathematics of Operations Research. His contributions to the field are recognized through his leadership roles, editorial positions, and the success of his advisees who hold faculty positions or leadership roles in industry.

Research topics

Computer Science
Artificial Intelligence
Machine Learning
Computer Security
Data Mining
Economics
Econometrics
Computer network
Mathematical optimization
Distributed computing
Mathematical economics
Mathematics

Selected publications

On the Gaussian Limit of the Output of IIR Filters
2025-12-09 · 1 citations
articleSenior author
We study the asymptotic distribution of the output of a stable Linear Time-Invariant (LTI) system driven by a non-Gaussian stochastic input. Motivated by longstanding heuristics in the stochastic describing function method, we rigorously characterize when the output process becomes approximately Gaussian, even when the input is not. Using the Wasserstein-1 distance as a quantitative measure of non-Gaussianity, we derive upper bounds on the distance between the appropriately scaled output and a standard normal distribution. These bounds are obtained via Stein’s method and depend explicitly on the system’s impulse response and the dependence structure of the input process. We show that when the dominant pole of the system approaches the edge of stability and the input satisfies one of the following conditions—(i) independence, (ii) positive correlation with a real and positive dominant pole, or (iii) sufficient correlation decay—the output converges to a standard normal distribution at rate $O\left( {1/\sqrt t } \right)$. We also present counterexamples where convergence fails, thereby motivating the stated assumptions. Our results provide a rigorous foundation for the widespread observation that outputs of low-pass LTI systems tend to be approximately Gaussian.
Publisher DOI
Convergence of Natural Policy Gradient for a family of infinite-state queueing MDPs
Queueing Systems · 2025-08-07 · 2 citations
articleOpen accessSenior author
Abstract A wide variety of queueing systems can be naturally modeled as infinite-state Markov Decision Processes (MDPs). In the reinforcement learning (RL) context, a variety of algorithms have been developed to learn and optimize these MDPs. At the heart of many popular policy-gradient-based learning algorithms, such as natural actor-critic, TRPO, and PPO, lies the Natural Policy Gradient (NPG) policy optimization algorithm. Convergence results for these RL algorithms rest on convergence results for the NPG algorithm. However, all existing results on the convergence of the NPG algorithm are limited to finite-state settings. We study a general class of queueing MDPs and prove a $$O(1/\sqrt{T})$$ <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"> <mml:mrow> <mml:mi>O</mml:mi> <mml:mo>(</mml:mo> <mml:mn>1</mml:mn> <mml:mo>/</mml:mo> <mml:msqrt> <mml:mi>T</mml:mi> </mml:msqrt> <mml:mo>)</mml:mo> </mml:mrow> </mml:math> convergence rate for the NPG algorithm, if the NPG algorithm is initialized with the MaxWeight policy. This is the first convergence rate bound for the NPG algorithm for a general class of infinite-state average-reward MDPs. Moreover, our result applies to a beyond the queueing setting to any countably infinite MDP satisfying certain mild structural assumptions, given a sufficiently good initial policy. Key to our result are state-dependent bounds on the relative value function achieved by the iterate policies of the NPG algorithm.
Publisher OA PDF DOI
Joint Optimal Transport and Embedding for Network Alignment
2025-04-22 · 8 citations
articleOpen access
Network alignment, which aims to find node correspondence across different networks, is the cornerstone of various downstream multi-network and Web mining tasks. Most of the embedding-based methods indirectly model cross-network node relationships by contrasting positive and negative node pairs sampled from hand-crafted strategies, which are vulnerable to graph noises and lead to potential misalignment of nodes. Another line of work based on the optimal transport (OT) theory directly models cross-network node relationships and generates noise-reduced alignments. However, OT methods heavily rely on fixed, pre-defined cost functions that prohibit end-to-end training and are hard to generalize. In this paper, we aim to unify the embedding and OT-based methods in a mutually beneficial manner and propose a joint optimal transport and embedding framework for network alignment named JOENA. For one thing (OT for embedding), through a simple yet effective transformation, the noise-reduced OT mapping serves as an adaptive sampling strategy directly modeling all cross-network node pairs for robust embedding learning. For another (embedding for OT), on top of the learned embeddings, the OT cost can be gradually trained in an end-to-end fashion, which further enhances the alignment quality. With a unified objective, the mutual benefits of both methods can be achieved by an alternating optimization schema with guaranteed convergence. Extensive experiments on real-world networks validate the effectiveness and scalability of JOENA, achieving up to 16% improvement in MRR and 20 times speedup compared with the state-of-the-art alignment methods.
Publisher OA PDF DOI
On the Convergence of Modified Policy Iteration in Risk-Sensitive Exponential Cost Markov Decision Processes
Operations Research · 2025-11-27
articleSenior author
Balancing Risk and Robustness in Dynamic Decision Making Many real systems, such as networks, finance, and safety-critical autonomy, must hedge against rare but costly events. Risk-sensitive control formalizes this idea by optimizing an exponential cost objective that prioritizes reliability over just average performance. Classical dynamic programming methods such as value iteration and policy iteration are well-understood in this risk-sensitive setting. However, modified policy iteration (MPI), which combines the strengths of both through partial policy evaluation, has lacked any theoretical understanding. This paper addresses this gap. It analyzes MPI for risk-sensitive Markov decision processes governed by a multiplicative Bellman equation, develops normalization and contraction tools suited to this setting, and proves both convergence and finite-time guarantees. The results provide a principled foundation for algorithms that combine computational efficiency with robustness, supporting the development of reinforcement learning methods that emphasize long-term reliability.
Publisher DOI
Provably Convergent Primal-Dual DPO for Constrained LLM Alignment
ArXiv.org · 2025-10-07
preprintOpen accessSenior author
The widespread application of large language models (LLMs) raises increasing demands on ensuring safety or imposing constraints, such as reducing harmful content and adhering to predefined rules. While there have been several works studying LLM safety alignment, these works either need to train three models and incur high memory costs, or require prior knowledge on the optimal solution. Witnessing this fact, we investigate the constrained alignment problem for LLMs, i.e., maximizing the reward of outputs while restricting the cost to stay below a threshold. We propose a novel primal-dual direct preference optimization (DPO) approach, which first trains a model using standard DPO on reward preference data to provide reward information, and then adopts a rearranged Lagrangian DPO objective utilizing the provided reward information to fine-tune LLMs. Our approach only needs to train two models rather than three, which significantly saves memory costs, and does not require extra prior knowledge. Moreover, we establish rigorous suboptimality and constraint violation guarantees. We also extend our approach to enable online exploration and drop the data coverage dependence in the results. Experiments on the PKU-SafeRLHF and TruthfulQA datasets demonstrate the state-of-the-art performance of our approach.
Publisher OA PDF DOI
Decentralized and Uncoordinated Learning of Stable Matchings: A Game-Theoretic Approach
Proceedings of the AAAI Conference on Artificial Intelligence · 2025-04-11 · 1 citations
articleOpen accessSenior author
We consider the problem of learning stable matchings with unknown preferences in a decentralized and uncoordinated manner, where ``decentralized" means that players make decisions individually without the influence of a central platform, and ``uncoordinated" means that players do not need to synchronize their decisions using pre-specified rules. First, we provide a game formulation for this problem with known preferences, where the set of pure Nash equilibria (NE) coincides with the set of stable matchings, and mixed NE can be rounded to a stable matching. Then, we show that for hierarchical markets, applying the exponential weight (EXP) learning algorithm to the stable matching game achieves logarithmic regret in a fully decentralized and uncoordinated fashion. Moreover, we show that EXP converges locally and exponentially fast to a stable matching in general matching markets. We complement our results by introducing another decentralized and uncoordinated learning algorithm that globally converges to a stable matching with arbitrarily high probability.
Publisher OA PDF DOI
Joint Optimal Transport and Embedding for Network Alignment
ArXiv.org · 2025-02-26
preprintOpen access
Network alignment, which aims to find node correspondence across different networks, is the cornerstone of various downstream multi-network and Web mining tasks. Most of the embedding-based methods indirectly model cross-network node relationships by contrasting positive and negative node pairs sampled from hand-crafted strategies, which are vulnerable to graph noises and lead to potential misalignment of nodes. Another line of work based on the optimal transport (OT) theory directly models cross-network node relationships and generates noise-reduced alignments. However, OT methods heavily rely on fixed, pre-defined cost functions that prohibit end-to-end training and are hard to generalize. In this paper, we aim to unify the embedding and OT-based methods in a mutually beneficial manner and propose a joint optimal transport and embedding framework for network alignment named JOENA. For one thing (OT for embedding), through a simple yet effective transformation, the noise-reduced OT mapping serves as an adaptive sampling strategy directly modeling all cross-network node pairs for robust embedding learning.For another (embedding for OT), on top of the learned embeddings, the OT cost can be gradually trained in an end-to-end fashion, which further enhances the alignment quality. With a unified objective, the mutual benefits of both methods can be achieved by an alternating optimization schema with guaranteed convergence. Extensive experiments on real-world networks validate the effectiveness and scalability of JOENA, achieving up to 16% improvement in MRR and 20x speedup compared with the state-of-the-art alignment methods.
Publisher OA PDF DOI
Rates of Convergence in the Central Limit Theorem for Markov Chains, with an Application to TD Learning
Mathematics of Operations Research · 2025-10-03
article1st authorCorresponding
We prove a nonasymptotic central limit theorem (CLT) for vector-valued martingale differences using Stein’s method, and we use Poisson’s equation to extend the result to functions of Markov chains. We then show that these results can be applied to establish a nonasymptotic CLT for temporal difference learning with averaging. Funding: This work was supported by National Science Foundation [Grants CNS 23-12714, CCF 22-07547, and CNS 21-06801] and Air Force Office of Scientific Research [Grant FA9550-24-1-0002].
Publisher DOI
Scalable Policy-Based RL Algorithms for POMDPs
ArXiv.org · 2025-10-08
preprintOpen accessSenior author
The continuous nature of belief states in POMDPs presents significant computational challenges in learning the optimal policy. In this paper, we consider an approach that solves a Partially Observable Reinforcement Learning (PORL) problem by approximating the corresponding POMDP model into a finite-state Markov Decision Process (MDP) (called Superstate MDP). We first derive theoretical guarantees that improve upon prior work that relate the optimal value function of the transformed Superstate MDP to the optimal value function of the original POMDP. Next, we propose a policy-based learning approach with linear function approximation to learn the optimal policy for the Superstate MDP. Consequently, our approach shows that a POMDP can be approximately solved using TD-learning followed by Policy Optimization by treating it as an MDP, where the MDP state corresponds to a finite history. We show that the approximation error decreases exponentially with the length of this history. To the best of our knowledge, our finite-time bounds are the first to explicitly quantify the error introduced when applying standard TD learning to a setting where the true dynamics are not Markovian.
Publisher OA PDF DOI
Optimal Hybrid Feedback-Driven Learning for Wireless Interactive Panoramic Scene Delivery
2025-10-23
articleOpen accessSenior author
Immersive technologies, such as virtual and augmented reality, demand high framerate, low latency, and precise synchronization between real and virtual environments. To meet these requirements, an edge server typically needs to perform high-quality rendering, and must predict user head motion and transmit a portion of the rendered panoramic scene that is large enough to cover the user's viewport, yet small enough to satisfy bandwidth constraints. Each portion yields two feedback signals: prediction feedback, indicating whether the selected portion covers the actual viewport, and transmission feedback, indicating whether all data packets are successfully delivered. While prior work models this setting as a multi-armed bandit with two-level bandit feedback, it overlooks that prediction feedback can be retrospectively computed for all possible portions, thus providing full-information feedback. In this work, we introduce a new two-level feedback model that combines full-information feedback with bandit feedback, and we formulate the portion selection problem as an online learning task under this hybrid setting. We derive an instance-dependent regret lower bound for this new hybrid feedback setting, and we propose AdaPort, a hybrid learning algorithm that leverages both the full-information feedback and bandit feedback to improve learning efficiency. We then show that the instance-dependent regret upper bound for AdaPort matches the lower bound asymptotically, proving its asymptotic optimality. Simulations using synthetic data and real-world traces demonstrate that AdaPort consistently outperforms state-of-the-art baselines, validating the benefits of exploiting the hybrid feedback structure.
Publisher OA PDF DOI

Recent grants

Collaborative Research: ITR : Network Coding - From Theory to Practice
NSF · $1.0M · 2003–2010
Collaborative Research: NETS-NBD: Towards a Multipath Network Architecture for Robust Data Transport
NSF · $415k · 2005–2010
Collaborative Research: Resource Allocation in Clouds: A Stochastic Modeling and Control Perspective
NSF · $225k · 2012–2017
FIND: Collaborative Research: Towards An Analytic Foundation for Network Architectures
NSF · $200k · 2007–2011
NeTS: Small: Collaborative Research: Fast Online Machine Learning Algorithms for Wireless Networks
NSF · $250k · 2017–2021

Frequent coauthors

Lei Ying
73 shared
Tamer Başar
62 shared
Carolyn L. Beck
45 shared
Georgios Fellouris
University of Illinois Urbana-Champaign
41 shared
Subhonmesh Bose
University of Illinois Urbana-Champaign
41 shared
Venugopal V. Veeravalli
University of Illinois Urbana-Champaign
41 shared
Mr Peterson
University of Illinois Urbana-Champaign
41 shared
Alejandro D. Domínguez-García
41 shared

Education

Ph.D., Computer Science
University of California, Berkeley
1990
M.S., Computer Science
University of California, Berkeley
1986
B.S., Electrical Engineering
Indian Institute of Technology, Madras
1982

Awards & honors

2015 INFOCOM Achievement Award
2019 IEEE Koji Kobayashi Computers and Communications Award
2021 ACM SIGMETRICS Achievement Award
Best Paper Award at INFOCOM (2015)
Best Publication Award from Applied Probability Society (201…

Resume-aware match score
Save to shortlist
AI-drafted outreach

See your match with Rayadurgam Srikant

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

Join the waitlist How it works

Free to start
No credit card
30-second signup

Find professors who actually fit you