
Michael James Neely
· Professor of Electrical and Computer EngineeringVerifiedUniversity of Southern California · Ming Hsieh Department of Electrical and Computer Engineering
Active 1979–2025
About
Michael J. Neely received B.S. degrees in both Electrical Engineering and Mathematics from the University of Maryland, College Park, in 1997. He was awarded a three-year Department of Defense NDSEG Fellowship for graduate study at the Massachusetts Institute of Technology, where he earned an M.S. degree in 1999 and a Ph.D. in 2003, both in Electrical Engineering. Since joining the faculty of Electrical Engineering at the University of Southern California in 2004, he has served as an Associate Professor. His research interests are in the areas of stochastic network optimization and queueing theory, with applications to wireless networks, mobile ad-hoc networks, and switching systems. His work includes analysis and design for wireless and computer networks, ad-hoc mobile networks, and general queueing systems, focusing on energy and throughput efficient scheduling in networks with random events and an unpredictable future.
Research topics
- Computer Science
- Mathematics
- Machine Learning
- Mathematical optimization
- Algorithm
- Applied mathematics
- Combinatorics
- Mathematical analysis
Selected publications
Bandit-Based Rate Adaptation for a Single-Server Queue
arXiv (Cornell University) · 2025-12-12
preprintOpen accessSenior authorThis paper considers the problem of obtaining bounded time-average expected queue sizes in a single-queue system with a partial-feedback structure. Time is slotted; in slot $t$ the transmitter chooses a rate $V(t)$ from a continuous interval. Transmission succeeds if and only if $V(t)\le C(t)$, where channel capacities $\{C(t)\}$ and arrivals are i.i.d. draws from fixed but unknown distributions. The transmitter observes only binary acknowledgments (ACK/NACK) indicating success or failure. Let $\varepsilon>0$ denote a sufficiently small lower bound on the slack between the arrival rate and the capacity region. We propose a phased algorithm that progressively refines a discretization of the uncountable infinite rate space and, without knowledge of $\varepsilon$, achieves a $\mathcal{O}\!\big(\log^{3.5}(1/\varepsilon)/\varepsilon^{3}\big)$ time-average expected queue size uniformly over the horizon. We also prove a converse result showing that for any rate-selection algorithm, regardless of whether $\varepsilon$ is known, there exists an environment in which the worst-case time-average expected queue size is $Ω(1/\varepsilon^{2})$. Thus, while a gap remains in the setting without knowledge of $\varepsilon$, we show that if $\varepsilon$ is known, a simple single-stage UCB type policy with a fixed discretization of the rate space achieves $\mathcal{O}\!\big(\log(1/\varepsilon)/\varepsilon^{2}\big)$, matching the converse up to logarithmic factors.
Adaptive Optimization for Stochastic Renewal Systems
Mathematics · 2025-12-01
articleOpen access1st authorCorrespondingThis paper considers online optimization for a sequence of tasks. Each task can be processed in one of multiple processing modes that affect the duration of the task, the reward earned, and an additional vector of penalties (such as energy or cost). Let A[k] be a random matrix that specifies the parameters of task k. The goal is to observe A[k] at the start of task k and then choose a processing mode for the task so that, over time, time average reward is maximized subject to time average penalty constraints. This is a renewal optimization problem. It is challenging because the probability distribution for the A[k] sequence is unknown. Efficient decisions must be learned in a timely manner. Prior work shows that any algorithm that comes within ϵ of optimality must have Ω(1/ϵ2) convergence time. The only known algorithm that can meet this bound operates without time average penalty constraints and uses a diminishing stepsize that cannot adapt when probabilities change. This paper develops a new algorithm that is adaptive and comes within Θ(ϵ) of optimality for any interval of Θ(1/ϵ2) tasks over which probabilities are held fixed.
Online Multi-player Resource-Sharing Games with Bandit Feedback
Dynamic Games and Applications · 2025-11-24
articleOpen accessSenior authorAbstract This paper considers an online multi-player resource-sharing game with bandit feedback. Multiple players choose from a finite collection of resources in a time slotted system. In each time slot, each resource brings a random reward that is equally divided among the players who choose it. The reward vector is independent and identically distributed over the time slots. The statistics of the reward vector are unknown to the players. During each time slot, for each resource chosen by the first player, they receive as feedback the reward of the resource and the number of players who chose it, after the choice is made. We develop a novel Upper Confidence Bound (UCB) algorithm that learns the mean rewards using the feedback and maximizes the worst-case time-average expected reward of the first player. The algorithm gets within $$\mathcal {O}(\log (T)/\sqrt{T})$$ of optimality within T time slots. The simulations depict fast convergence of the learnt policy in comparison to the worst-case optimal policy.
Automatic Link Selection in Multi-Channel Multiple Access with Link Failures
ArXiv.org · 2025-01-24
preprintOpen accessThis paper focuses on the problem of automatic link selection in multi-channel multiple access control using bandit feedback. In particular, a controller assigns multiple users to multiple channels in a time-slotted system, where in each time slot, at most one user can be assigned to a given channel, and at most one channel can be assigned to a given user. Given that user $i$ is assigned to channel $j$, the transmission fails with a fixed unknown probability $1-q_{i,j}$. The assignments are made dynamically using success/failure feedback. The goal is to maximize the time-average utility, where we consider an arbitrary (possibly nonsmooth) concave, entrywise nondecreasing utility function. The first proposed algorithm has fast $\mathcal{O}(\sqrt{\log(T)/T})$ convergence. However, this algorithm requires solving a convex optimization problem within each iteration, which can be computationally expensive. The second algorithm has slower $\mathcal{O}(\sqrt[3]{\log(T)/T})$ convergence, while avoiding the costly inner optimization. Both of these algorithms are adaptive. In particular, the convergence guarantee holds for any interval of $T$ consecutive slots during which the success probabilities do not change. We further study several special cases. In the single-channel setting, we obtain both fast $\mathcal{O}(\sqrt{\log(T)/T})$ convergence and efficient implementation via a simpler adaptive mechanism. We also consider a UCB-based non-adaptive algorithm with max-weight-type decisions. Simulations highlight intriguing performance trade-offs and demonstrate rapid adaptation of the proposed adaptive schemes.
Adaptive Algorithms for Automatic Link Selection in Multiple Access with Link Failures
2025-05-26
articleSenior authorThis paper focuses on the problem of automatic link selection in multi-channel multiple access control using bandit feedback. In particular, a controller assigns multiple users to multiple channels in a time slotted system, where in each time slot at most one user can be assigned to a given channel and at most one channel can be assigned to a given user. Given that user <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$i$</tex> is assigned to channel <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$j$</tex>, the transmission fails with a fixed probability <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$f_{i, j}$</tex>. The failure probabilities are not known to the controller. The assignments are made dynamically using success/failure feedback. The goal is to maximize the time average utility, where we consider an arbitrary (possibly nonsmooth) concave and entrywise nondecreasing utility function. The problem of merely maximizing the total throughput has a solution of always assigning the same user-channel pairs and can be unfair to certain users, particularly when the number of channels is less than the number of users. Instead, our scheme allows various types of fairness, such as proportional fairness, maximizing the minimum, or combinations of these by defining the appropriate utility function. We propose an algorithm for this task that is adaptive and gets within <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$\mathcal{O}\left(\log (T) / T^{1 / 3}\right)$</tex> of optimality over any interval of <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$T$</tex> consecutive slots over which the success probabilities do not change. This performance is improved to <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$\mathcal{O}(1 / \sqrt{T})$</tex> for single-channel problems with a minimum constraint on the rate of transmission attempts per user.
Opportunistic Learning for Markov Decision Systems with Application to Smart Robots
arXiv (Cornell University) · 2024-08-09
preprintOpen access1st authorCorrespondingThis paper presents an online method that learns optimal decisions for a discrete time Markov decision problem with an opportunistic structure. The state at time $t$ is a pair $(S(t),W(t))$ where $S(t)$ takes values in a finite set $\mathcal{S}$ of basic states, and $\{W(t)\}_{t=0}^{\infty}$ is an i.i.d. sequence of random vectors that affect the system and that have an unknown distribution. Every slot $t$ the controller observes $(S(t),W(t))$ and chooses a control action $A(t)$. The triplet $(S(t),W(t),A(t))$ determines a vector of costs and the transition probabilities for the next state $S(t+1)$. The goal is to minimize the time average of an objective function subject to additional time average cost constraints. We develop an algorithm that acts on a corresponding virtual system where $S(t)$ is replaced by a decision variable. An equivalence between virtual and actual systems is established by enforcing a collection of time averaged global balance equations. For any desired $ε>0$, we prove the algorithm achieves an $ε$-optimal solution on the virtual system with a convergence time of $O(1/ε^2)$. The actual system runs at the same time, its actions are informed by the virtual system, and its conditional transition probabilities and costs are proven to be the same as the virtual system at every instant of time. Also, its unconditional probabilities and costs are shown in simulation to closely match the virtual system. Our simulations consider online control of a robot that explores a region of interest. Objects with varying rewards appear and disappear and the robot learns what areas to explore and what objects to collect and deliver to a home base.
Nonsmooth projection-free optimization with functional constraints
Computational Optimization and Applications · 2024-09-26 · 1 citations
articleOpen accessSenior authorAbstract This paper presents a subgradient-based algorithm for constrained nonsmooth convex optimization that does not require projections onto the feasible set. While the well-established Frank–Wolfe algorithm and its variants already avoid projections, they are primarily designed for smooth objective functions. In contrast, our proposed algorithm can handle nonsmooth problems with general convex functional inequality constraints. It achieves an $$\epsilon $$ <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"> <mml:mi>ϵ</mml:mi> </mml:math> -suboptimal solution in $$\mathcal {O}(\epsilon ^{-2})$$ <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"> <mml:mrow> <mml:mi>O</mml:mi> <mml:mo>(</mml:mo> <mml:msup> <mml:mi>ϵ</mml:mi> <mml:mrow> <mml:mo>-</mml:mo> <mml:mn>2</mml:mn> </mml:mrow> </mml:msup> <mml:mo>)</mml:mo> </mml:mrow> </mml:math> iterations, with each iteration requiring only a single (potentially inexact) Linear Minimization Oracle call and a (possibly inexact) subgradient computation. This performance is consistent with existing lower bounds. Similar performance is observed when deterministic subgradients are replaced with stochastic subgradients. In the special case where there are no functional inequality constraints, our algorithm competes favorably with a recent nonsmooth projection-free method designed for constraint-free problems. Our approach utilizes a simple separation scheme in conjunction with a new Lagrange multiplier update rule.
Multi-Player Resource-Sharing Games with Fair Reward Allocation
arXiv (Cornell University) · 2024-02-07
preprintOpen accessSenior authorThis paper considers an online multi-player resource-sharing game with bandit feedback. Multiple players choose from a finite collection of resources in a time slotted system. In each time slot, each resource brings a random reward that is equally divided among the players who choose it. The reward vector is independent and identically distributed over the time slots. The statistics of the reward vector are unknown to the players. During each time slot, for each resource chosen by the first player, they receive as feedback the reward of the resource and the number of players who chose it, after the choice is made. We develop a novel Upper Confidence Bound (UCB) algorithm that learns the mean rewards using the feedback and maximizes the worst-case time-average expected reward of the first player. The algorithm gets within $\mathcal{O}(\log(T)/\sqrt{T})$ of optimality within $T$ time slots. The simulations depict fast convergence of the learnt policy in comparison to the worst-case optimal policy.
Opportunistic Learning for Markov Decision Systems with Application to Smart Robots
2024-09-24 · 2 citations
article1st authorCorrespondingThis paper presents an online method that learns optimal decisions for a discrete time Markov decision problem with an opportunistic structure. The state at time <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$t$</tex> is a pair <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$(S(t),\ W(t))$</tex> where <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$S(t)$</tex> takes values in a finite set <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$\mathcal{S}$</tex> of basic states, and <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$\{W(t)\}_{t=0}^{\infty}$</tex> is an i.i.d. sequence of random vectors that affect the system and that have an unknown distribution. Every slot <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$t$</tex> the controller observes <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$(S(t),\ W(t))$</tex> and chooses a control action <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$A(t)$</tex>. The triplet <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$(S(t),\ W(t),\ A(t))$</tex> determines a vector of costs and the transition probabilities for the next state <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$S(t+1)$</tex>. The goal is to minimize the time average of an objective function subject to additional time average cost constraints. We develop an algorithm that acts on a corresponding virtual system where <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$S(t)$</tex> is replaced by a decision variable. An equivalence between virtual and actual systems is established by enforcing a collection of time averaged global balance equations. For any desired <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$\epsilon > 0$</tex>, we prove the algorithm achieves an <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$\epsilon$</tex>.optimal solution on the virtual system with a convergence time of <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$O(1/\epsilon^{2})$</tex>. The actual system runs at the same time, its actions are informed by the virtual system, and its conditional transition probabilities and costs are proven to be the same as the virtual system at every instant of time. Also, its unconditional probabilities and costs are shown in simulation to closely match the virtual system. Our simulations consider online control of a robot that explores a region of interest. Objects with varying rewards appear and disappear and the robot learns what areas to explore and what objects to collect and deliver to a home base.
Adaptive Optimization for Stochastic Renewal Systems
arXiv (Cornell University) · 2024-01-13
preprintOpen access1st authorCorrespondingThis paper considers online optimization for a system that performs a sequence of back-to-back tasks. Each task can be processed in one of multiple processing modes that affect the duration of the task, the reward earned, and an additional vector of penalties (such as energy or cost). Let $A[k]$ be a random matrix of parameters that specifies the duration, reward, and penalty vector under each processing option for task $k$. The goal is to observe $A[k]$ at the start of each new task $k$ and then choose a processing mode for the task so that, over time, time average reward is maximized subject to time average penalty constraints. This is a \emph{renewal optimization problem} and is challenging because the probability distribution for the $A[k]$ sequence is unknown. Prior work shows that any algorithm that comes within $ε$ of optimality must have $Ω(1/ε^2)$ convergence time. The only known algorithm that can meet this bound operates without time average penalty constraints and uses a diminishing stepsize that cannot adapt when probabilities change. This paper develops a new algorithm that is adaptive and comes within $O(ε)$ of optimality for any interval of $Θ(1/ε^2)$ tasks over which probabilities are held fixed, regardless of probabilities before the start of the interval.
Recent grants
NSF · $175k · 2018–2023
CAREER: "Analysis and Control of Network Delay"
NSF · $400k · 2008–2015
NeTS:Small:Optimal Learning Times for Task-Oriented Communication Networks
NSF · $477k · 2017–2022
Frequent coauthors
- 36 shared
Xiaohan Wei
University of Edinburgh
- 36 shared
Rahul Urgaonkar
Amazon (United States)
- 36 shared
Hao Yu
Changchun University of Science and Technology
- 23 shared
Longbo Huang
Tsinghua University
- 21 shared
Eytan Modiano
Massachusetts Institute of Technology
- 15 shared
Sucha Supittayapornpong
Vidyasirimedhi Institute of Science and Technology
- 14 shared
Chih‐Ping Li
Saitama Red Cross Hospital
- 12 shared
Giuseppe Caire
Technische Universität Berlin
Education
- 1990
Ph.D., Electrical Engineering
University of Southern California
- 1986
M.S., Electrical Engineering
University of Southern California
- 1984
B.S., Electrical Engineering
University of Southern California
Awards & honors
- NSF Career Award (2008)
- Viterbi School of Engineering Junior Faculty Research Award…
- Research Grant Award from the Okawa Foundation (2012)
- Okawa Foundation (2013)
- IEEE Senior Member (2008)
- Resume-aware match score
- Save to shortlist
- AI-drafted outreach
See your match with Michael James Neely
PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.
- Free to start
- No credit card
- 30-second signup