Michael James Neely

· Professor of Electrical and Computer EngineeringVerified

University of Southern California · Ming Hsieh Department of Electrical and Computer Engineering

Active 1979–2025

h-index56

Citations20.2k

Papers34639 last 5y

Funding$1.1M

Faculty page Lab page

See your match with Michael James Neely — sign in to PhdFit.Sign in

About

Michael J. Neely received B.S. degrees in both Electrical Engineering and Mathematics from the University of Maryland, College Park, in 1997. He was awarded a three-year Department of Defense NDSEG Fellowship for graduate study at the Massachusetts Institute of Technology, where he earned an M.S. degree in 1999 and a Ph.D. in 2003, both in Electrical Engineering. Since joining the faculty of Electrical Engineering at the University of Southern California in 2004, he has served as an Associate Professor. His research interests are in the areas of stochastic network optimization and queueing theory, with applications to wireless networks, mobile ad-hoc networks, and switching systems. His work includes analysis and design for wireless and computer networks, ad-hoc mobile networks, and general queueing systems, focusing on energy and throughput efficient scheduling in networks with random events and an unpredictable future.

Research topics

Computer Science
Mathematics
Machine Learning
Mathematical optimization
Algorithm
Applied mathematics
Combinatorics
Mathematical analysis

Selected publications

Bandit-Based Rate Adaptation for a Single-Server Queue
arXiv (Cornell University) · 2025-12-12
preprintOpen accessSenior author
This paper considers the problem of obtaining bounded time-average expected queue sizes in a single-queue system with a partial-feedback structure. Time is slotted; in slot $t$ the transmitter chooses a rate $V(t)$ from a continuous interval. Transmission succeeds if and only if $V(t)\le C(t)$, where channel capacities $\{C(t)\}$ and arrivals are i.i.d. draws from fixed but unknown distributions. The transmitter observes only binary acknowledgments (ACK/NACK) indicating success or failure. Let $\varepsilon>0$ denote a sufficiently small lower bound on the slack between the arrival rate and the capacity region. We propose a phased algorithm that progressively refines a discretization of the uncountable infinite rate space and, without knowledge of $\varepsilon$, achieves a $\mathcal{O}\!\big(\log^{3.5}(1/\varepsilon)/\varepsilon^{3}\big)$ time-average expected queue size uniformly over the horizon. We also prove a converse result showing that for any rate-selection algorithm, regardless of whether $\varepsilon$ is known, there exists an environment in which the worst-case time-average expected queue size is $Ω(1/\varepsilon^{2})$. Thus, while a gap remains in the setting without knowledge of $\varepsilon$, we show that if $\varepsilon$ is known, a simple single-stage UCB type policy with a fixed discretization of the rate space achieves $\mathcal{O}\!\big(\log(1/\varepsilon)/\varepsilon^{2}\big)$, matching the converse up to logarithmic factors.
Publisher OA PDF DOI
Adaptive Optimization for Stochastic Renewal Systems
Mathematics · 2025-12-01
articleOpen access1st authorCorresponding
This paper considers online optimization for a sequence of tasks. Each task can be processed in one of multiple processing modes that affect the duration of the task, the reward earned, and an additional vector of penalties (such as energy or cost). Let A[k] be a random matrix that specifies the parameters of task k. The goal is to observe A[k] at the start of task k and then choose a processing mode for the task so that, over time, time average reward is maximized subject to time average penalty constraints. This is a renewal optimization problem. It is challenging because the probability distribution for the A[k] sequence is unknown. Efficient decisions must be learned in a timely manner. Prior work shows that any algorithm that comes within ϵ of optimality must have Ω(1/ϵ2) convergence time. The only known algorithm that can meet this bound operates without time average penalty constraints and uses a diminishing stepsize that cannot adapt when probabilities change. This paper develops a new algorithm that is adaptive and comes within Θ(ϵ) of optimality for any interval of Θ(1/ϵ2) tasks over which probabilities are held fixed.
Publisher DOI
Online Multi-player Resource-Sharing Games with Bandit Feedback
Dynamic Games and Applications · 2025-11-24
articleOpen accessSenior author
Abstract This paper considers an online multi-player resource-sharing game with bandit feedback. Multiple players choose from a finite collection of resources in a time slotted system. In each time slot, each resource brings a random reward that is equally divided among the players who choose it. The reward vector is independent and identically distributed over the time slots. The statistics of the reward vector are unknown to the players. During each time slot, for each resource chosen by the first player, they receive as feedback the reward of the resource and the number of players who chose it, after the choice is made. We develop a novel Upper Confidence Bound (UCB) algorithm that learns the mean rewards using the feedback and maximizes the worst-case time-average expected reward of the first player. The algorithm gets within $$\mathcal {O}(\log (T)/\sqrt{T})$$ of optimality within T time slots. The simulations depict fast convergence of the learnt policy in comparison to the worst-case optimal policy.
Publisher OA PDF DOI
Automatic Link Selection in Multi-Channel Multiple Access with Link Failures
ArXiv.org · 2025-01-24
preprintOpen access
This paper focuses on the problem of automatic link selection in multi-channel multiple access control using bandit feedback. In particular, a controller assigns multiple users to multiple channels in a time-slotted system, where in each time slot, at most one user can be assigned to a given channel, and at most one channel can be assigned to a given user. Given that user $i$ is assigned to channel $j$, the transmission fails with a fixed unknown probability $1-q_{i,j}$. The assignments are made dynamically using success/failure feedback. The goal is to maximize the time-average utility, where we consider an arbitrary (possibly nonsmooth) concave, entrywise nondecreasing utility function. The first proposed algorithm has fast $\mathcal{O}(\sqrt{\log(T)/T})$ convergence. However, this algorithm requires solving a convex optimization problem within each iteration, which can be computationally expensive. The second algorithm has slower $\mathcal{O}(\sqrt[3]{\log(T)/T})$ convergence, while avoiding the costly inner optimization. Both of these algorithms are adaptive. In particular, the convergence guarantee holds for any interval of $T$ consecutive slots during which the success probabilities do not change. We further study several special cases. In the single-channel setting, we obtain both fast $\mathcal{O}(\sqrt{\log(T)/T})$ convergence and efficient implementation via a simpler adaptive mechanism. We also consider a UCB-based non-adaptive algorithm with max-weight-type decisions. Simulations highlight intriguing performance trade-offs and demonstrate rapid adaptation of the proposed adaptive schemes.
Publisher OA PDF DOI
Adaptive Algorithms for Automatic Link Selection in Multiple Access with Link Failures
2025-05-26
articleSenior author
This paper focuses on the problem of automatic link selection in multi-channel multiple access control using bandit feedback. In particular, a controller assigns multiple users to multiple channels in a time slotted system, where in each time slot at most one user can be assigned to a given channel and at most one channel can be assigned to a given user. Given that user <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$i$</tex> is assigned to channel <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$j$</tex>, the transmission fails with a fixed probability <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$f_{i, j}$</tex>. The failure probabilities are not known to the controller. The assignments are made dynamically using success/failure feedback. The goal is to maximize the time average utility, where we consider an arbitrary (possibly nonsmooth) concave and entrywise nondecreasing utility function. The problem of merely maximizing the total throughput has a solution of always assigning the same user-channel pairs and can be unfair to certain users, particularly when the number of channels is less than the number of users. Instead, our scheme allows various types of fairness, such as proportional fairness, maximizing the minimum, or combinations of these by defining the appropriate utility function. We propose an algorithm for this task that is adaptive and gets within <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$\mathcal{O}\left(\log (T) / T^{1 / 3}\right)$</tex> of optimality over any interval of <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$T$</tex> consecutive slots over which the success probabilities do not change. This performance is improved to <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$\mathcal{O}(1 / \sqrt{T})$</tex> for single-channel problems with a minimum constraint on the rate of transmission attempts per user.
Publisher DOI
Opportunistic Learning for Markov Decision Systems with Application to Smart Robots
arXiv (Cornell University) · 2024-08-09
preprintOpen access1st authorCorresponding
This paper presents an online method that learns optimal decisions for a discrete time Markov decision problem with an opportunistic structure. The state at time $t$ is a pair $(S(t),W(t))$ where $S(t)$ takes values in a finite set $\mathcal{S}$ of basic states, and $\{W(t)\}_{t=0}^{\infty}$ is an i.i.d. sequence of random vectors that affect the system and that have an unknown distribution. Every slot $t$ the controller observes $(S(t),W(t))$ and chooses a control action $A(t)$. The triplet $(S(t),W(t),A(t))$ determines a vector of costs and the transition probabilities for the next state $S(t+1)$. The goal is to minimize the time average of an objective function subject to additional time average cost constraints. We develop an algorithm that acts on a corresponding virtual system where $S(t)$ is replaced by a decision variable. An equivalence between virtual and actual systems is established by enforcing a collection of time averaged global balance equations. For any desired $ε>0$, we prove the algorithm achieves an $ε$-optimal solution on the virtual system with a convergence time of $O(1/ε^2)$. The actual system runs at the same time, its actions are informed by the virtual system, and its conditional transition probabilities and costs are proven to be the same as the virtual system at every instant of time. Also, its unconditional probabilities and costs are shown in simulation to closely match the virtual system. Our simulations consider online control of a robot that explores a region of interest. Objects with varying rewards appear and disappear and the robot learns what areas to explore and what objects to collect and deliver to a home base.
Publisher OA PDF DOI
Nonsmooth projection-free optimization with functional constraints
Computational Optimization and Applications · 2024-09-26 · 1 citations
articleOpen accessSenior author
Abstract This paper presents a subgradient-based algorithm for constrained nonsmooth convex optimization that does not require projections onto the feasible set. While the well-established Frank–Wolfe algorithm and its variants already avoid projections, they are primarily designed for smooth objective functions. In contrast, our proposed algorithm can handle nonsmooth problems with general convex functional inequality constraints. It achieves an $$\epsilon $$ <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"> <mml:mi>ϵ</mml:mi> </mml:math> -suboptimal solution in $$\mathcal {O}(\epsilon ^{-2})$$ <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"> <mml:mrow> <mml:mi>O</mml:mi> <mml:mo>(</mml:mo> <mml:msup> <mml:mi>ϵ</mml:mi> <mml:mrow> <mml:mo>-</mml:mo> <mml:mn>2</mml:mn> </mml:mrow> </mml:msup> <mml:mo>)</mml:mo> </mml:mrow> </mml:math> iterations, with each iteration requiring only a single (potentially inexact) Linear Minimization Oracle call and a (possibly inexact) subgradient computation. This performance is consistent with existing lower bounds. Similar performance is observed when deterministic subgradients are replaced with stochastic subgradients. In the special case where there are no functional inequality constraints, our algorithm competes favorably with a recent nonsmooth projection-free method designed for constraint-free problems. Our approach utilizes a simple separation scheme in conjunction with a new Lagrange multiplier update rule.
Publisher OA PDF DOI
Multi-Player Resource-Sharing Games with Fair Reward Allocation
arXiv (Cornell University) · 2024-02-07
preprintOpen accessSenior author
This paper considers an online multi-player resource-sharing game with bandit feedback. Multiple players choose from a finite collection of resources in a time slotted system. In each time slot, each resource brings a random reward that is equally divided among the players who choose it. The reward vector is independent and identically distributed over the time slots. The statistics of the reward vector are unknown to the players. During each time slot, for each resource chosen by the first player, they receive as feedback the reward of the resource and the number of players who chose it, after the choice is made. We develop a novel Upper Confidence Bound (UCB) algorithm that learns the mean rewards using the feedback and maximizes the worst-case time-average expected reward of the first player. The algorithm gets within $\mathcal{O}(\log(T)/\sqrt{T})$ of optimality within $T$ time slots. The simulations depict fast convergence of the learnt policy in comparison to the worst-case optimal policy.
Publisher OA PDF DOI
Opportunistic Learning for Markov Decision Systems with Application to Smart Robots
2024-09-24 · 2 citations
article1st authorCorresponding
This paper presents an online method that learns optimal decisions for a discrete time Markov decision problem with an opportunistic structure. The state at time <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$t$</tex> is a pair <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$(S(t),\ W(t))$</tex> where <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$S(t)$</tex> takes values in a finite set <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$\mathcal{S}$</tex> of basic states, and <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$\{W(t)\}_{t=0}^{\infty}$</tex> is an i.i.d. sequence of random vectors that affect the system and that have an unknown distribution. Every slot <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$t$</tex> the controller observes <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$(S(t),\ W(t))$</tex> and chooses a control action <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$A(t)$</tex>. The triplet <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$(S(t),\ W(t),\ A(t))$</tex> determines a vector of costs and the transition probabilities for the next state <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$S(t+1)$</tex>. The goal is to minimize the time average of an objective function subject to additional time average cost constraints. We develop an algorithm that acts on a corresponding virtual system where <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$S(t)$</tex> is replaced by a decision variable. An equivalence between virtual and actual systems is established by enforcing a collection of time averaged global balance equations. For any desired <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$\epsilon > 0$</tex>, we prove the algorithm achieves an <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$\epsilon$</tex>.optimal solution on the virtual system with a convergence time of <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$O(1/\epsilon^{2})$</tex>. The actual system runs at the same time, its actions are informed by the virtual system, and its conditional transition probabilities and costs are proven to be the same as the virtual system at every instant of time. Also, its unconditional probabilities and costs are shown in simulation to closely match the virtual system. Our simulations consider online control of a robot that explores a region of interest. Objects with varying rewards appear and disappear and the robot learns what areas to explore and what objects to collect and deliver to a home base.
Publisher DOI
Adaptive Optimization for Stochastic Renewal Systems
arXiv (Cornell University) · 2024-01-13
preprintOpen access1st authorCorresponding
This paper considers online optimization for a system that performs a sequence of back-to-back tasks. Each task can be processed in one of multiple processing modes that affect the duration of the task, the reward earned, and an additional vector of penalties (such as energy or cost). Let $A[k]$ be a random matrix of parameters that specifies the duration, reward, and penalty vector under each processing option for task $k$. The goal is to observe $A[k]$ at the start of each new task $k$ and then choose a processing mode for the task so that, over time, time average reward is maximized subject to time average penalty constraints. This is a \emph{renewal optimization problem} and is challenging because the probability distribution for the $A[k]$ sequence is unknown. Prior work shows that any algorithm that comes within $ε$ of optimality must have $Ω(1/ε^2)$ convergence time. The only known algorithm that can meet this bound operates without time average penalty constraints and uses a diminishing stepsize that cannot adapt when probabilities change. This paper develops a new algorithm that is adaptive and comes within $O(ε)$ of optimality for any interval of $Θ(1/ε^2)$ tasks over which probabilities are held fixed, regardless of probabilities before the start of the interval.
Publisher OA PDF DOI

Recent grants

SpecEES: Collaborative Research: Leveraging Randomization and Human Behavior for Efficient Large-Scale Distributed Spectrum Access
NSF · $175k · 2018–2023
CAREER: "Analysis and Control of Network Delay"
NSF · $400k · 2008–2015
NeTS:Small:Optimal Learning Times for Task-Oriented Communication Networks
NSF · $477k · 2017–2022

Frequent coauthors

Xiaohan Wei
University of Edinburgh
36 shared
Rahul Urgaonkar
Amazon (United States)
36 shared
Hao Yu
Changchun University of Science and Technology
36 shared
Longbo Huang
Tsinghua University
23 shared
Eytan Modiano
Massachusetts Institute of Technology
21 shared
Sucha Supittayapornpong
Vidyasirimedhi Institute of Science and Technology
15 shared
Chih‐Ping Li
Saitama Red Cross Hospital
14 shared
Giuseppe Caire
Technische Universität Berlin
12 shared

Education

Ph.D., Electrical Engineering
University of Southern California
1990
M.S., Electrical Engineering
University of Southern California
1986
B.S., Electrical Engineering
University of Southern California
1984

Awards & honors

NSF Career Award (2008)
Viterbi School of Engineering Junior Faculty Research Award…
Research Grant Award from the Okawa Foundation (2012)
Okawa Foundation (2013)
IEEE Senior Member (2008)

Resume-aware match score
Save to shortlist
AI-drafted outreach

See your match with Michael James Neely

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

Join the waitlist How it works

Free to start
No credit card
30-second signup

Find professors who actually fit you