Nima Anari

· Assistant Professor of Computer ScienceVerified

Stanford University · Demography

Active 2010–2026

h-index18

Citations1.1k

Papers12562 last 5y

Funding—

Faculty page Website

See your match with Nima Anari — sign in to PhdFit.Sign in

Research topics

Combinatorics
Mathematics

Selected publications

Fast Spanning Tree Sampling in Broadcast Congested Clique
ArXiv.org · 2026-03-26
articleOpen access1st authorCorresponding
We present the first polylogarithmic-round algorithm for sampling a random spanning tree in the (Broadcast) Congested Clique model. For any constant $c > 0$, our algorithm outputs a sample from a distribution whose total variation distance from the uniform spanning tree distribution is at most $O(n^{-c})$ in at most $c \cdot \log^{O(1)}(n)$ rounds. The exponent hidden in $\log^{O(1)}(n)$ is an absolute constant independent of $c$ and $n$. This is an exponential improvement over the previous best algorithm of Pemmaraju, Roy, and Sobel (PODC 2025) for the Congested Clique model.
Publisher OA PDF
Optimal $e^{(γ+o(1))n}$-Approximation of the Permanent of Positive Semidefinite Matrices
arXiv (Cornell University) · 2026-05-21
preprintOpen access1st authorCorresponding
We determine, up to lower-order terms in the exponent, the best possible deterministic polynomial-time approximation ratio for the permanent of a Hermitian positive semidefinite matrix. If $A\succeq 0$ has no zero diagonal entry, $d=\operatorname{rank}(A)$, $A=VV^\dagger$ with $V\in\mathbb{C}^{n\times d}$ full column rank, and $v_1,\ldots,v_n$ are the rows of $V$, define \[ Φ(V)=\max_{X\succ 0} \left\{\sum_{i=1}^n \log(v_i^\dagger Xv_i)+\log\det X-\operatorname{tr} X+d\right\}, \qquad \widehat P(A)=e^{Φ(V)}. \] We prove the exact sandwich \[ e^{-γn}\widehat P(A)\le \operatorname{per}(A)\le \widehat P(A). \] Here $γ$ is the Euler--Mascheroni constant. Since the maximization is concave, this gives a deterministic polynomial-time $e^{(γ+\varepsilon)n}$-approximation for every $\varepsilon>0$. Combined with the previous $e^{(γ-\varepsilon)n}$-hardness of approximation for positive semidefinite permanents, this resolves the optimal exponential approximation ratio for deterministic polynomial-time algorithms as $e^{(γ+o(1))n}$, assuming $\mathrm{P}\ne\mathrm{NP}$. The proof is an entropy argument applied to the standard Wick integral formula for $\operatorname{per}(A)$; the loss is exactly $γ$ per factor because $\mathbb{E}[\log T]=-γ$ for $T\sim\operatorname{Exp}(1)$. The result was obtained through interactions with GPT 5.5 Pro Extended: the first author's interaction was one-shot, while the second author's was a separate multi-turn interaction with high-level guidance. Both authors verified the theorem and proof. Codex was used to assemble and typeset the manuscript.
Publisher DOI
Optimal $e^{(γ+o(1))n}$-Approximation of the Permanent of Positive Semidefinite Matrices
ArXiv.org · 2026-05-21
articleOpen access1st authorCorresponding
We determine, up to lower-order terms in the exponent, the best possible deterministic polynomial-time approximation ratio for the permanent of a Hermitian positive semidefinite matrix. If $A\succeq 0$ has no zero diagonal entry, $d=\operatorname{rank}(A)$, $A=VV^\dagger$ with $V\in\mathbb{C}^{n\times d}$ full column rank, and $v_1,\ldots,v_n$ are the rows of $V$, define \[ Φ(V)=\max_{X\succ 0} \left\{\sum_{i=1}^n \log(v_i^\dagger Xv_i)+\log\det X-\operatorname{tr} X+d\right\}, \qquad \widehat P(A)=e^{Φ(V)}. \] We prove the exact sandwich \[ e^{-γn}\widehat P(A)\le \operatorname{per}(A)\le \widehat P(A). \] Here $γ$ is the Euler--Mascheroni constant. Since the maximization is concave, this gives a deterministic polynomial-time $e^{(γ+\varepsilon)n}$-approximation for every $\varepsilon>0$. Combined with the previous $e^{(γ-\varepsilon)n}$-hardness of approximation for positive semidefinite permanents, this resolves the optimal exponential approximation ratio for deterministic polynomial-time algorithms as $e^{(γ+o(1))n}$, assuming $\mathrm{P}\ne\mathrm{NP}$. The proof is an entropy argument applied to the standard Wick integral formula for $\operatorname{per}(A)$; the loss is exactly $γ$ per factor because $\mathbb{E}[\log T]=-γ$ for $T\sim\operatorname{Exp}(1)$. The result was obtained through interactions with GPT 5.5 Pro Extended: the first author's interaction was one-shot, while the second author's was a separate multi-turn interaction with high-level guidance. Both authors verified the theorem and proof. Codex was used to assemble and typeset the manuscript.
Publisher OA PDF
Fast Spanning Tree Sampling in Broadcast Congested Clique
arXiv (Cornell University) · 2026-03-26
preprintOpen access1st authorCorresponding
We present the first polylogarithmic-round algorithm for sampling a random spanning tree in the (Broadcast) Congested Clique model. For any constant $c > 0$, our algorithm outputs a sample from a distribution whose total variation distance from the uniform spanning tree distribution is at most $O(n^{-c})$ in at most $c \cdot \log^{O(1)}(n)$ rounds. The exponent hidden in $\log^{O(1)}(n)$ is an absolute constant independent of $c$ and $n$. This is an exponential improvement over the previous best algorithm of Pemmaraju, Roy, and Sobel (PODC 2025) for the Congested Clique model.
Publisher DOI
Diffusion Models are Secretly Exchangeable: Parallelizing DDPMs via Autospeculation
ArXiv.org · 2025-05-06
preprintOpen accessSenior author
Denoising Diffusion Probabilistic Models (DDPMs) have emerged as powerful tools for generative modeling. However, their sequential computation requirements lead to significant inference-time bottlenecks. In this work, we utilize the connection between DDPMs and Stochastic Localization to prove that, under an appropriate reparametrization, the increments of DDPM satisfy an exchangeability property. This general insight enables near-black-box adaptation of various performance optimization techniques from autoregressive models to the diffusion setting. To demonstrate this, we introduce \emph{Autospeculative Decoding} (ASD), an extension of the widely used speculative decoding algorithm to DDPMs that does not require any auxiliary draft models. Our theoretical analysis shows that ASD achieves a $\tilde{O} (K^{\frac{1}{3}})$ parallel runtime speedup over the $K$ step sequential DDPM. We also demonstrate that a practical implementation of autospeculative decoding accelerates DDPM inference significantly in various domains.
Publisher OA PDF DOI
Parallel Sampling via Autospeculation
ArXiv.org · 2025-11-11
preprintOpen access1st authorCorresponding
We present parallel algorithms to accelerate sampling via counting in two settings: any-order autoregressive models and denoising diffusion models. An any-order autoregressive model accesses a target distribution $μ$ on $[q]^n$ through an oracle that provides conditional marginals, while a denoising diffusion model accesses a target distribution $μ$ on $\mathbb{R}^n$ through an oracle that provides conditional means under Gaussian noise. Standard sequential sampling algorithms require $\widetilde{O}(n)$ time to produce a sample from $μ$ in either setting. We show that, by issuing oracle calls in parallel, the expected sampling time can be reduced to $\widetilde{O}(n^{1/2})$. This improves the previous $\widetilde{O}(n^{2/3})$ bound for any-order autoregressive models and yields the first parallel speedup for diffusion models in the high-accuracy regime, under the relatively mild assumption that the support of $μ$ is bounded. We introduce a novel technique to obtain our results: speculative rejection sampling. This technique leverages an auxiliary ``speculative'' distribution~$ν$ that approximates~$μ$ to accelerate sampling. Our technique is inspired by the well-studied ``speculative decoding'' techniques popular in large language models, but differs in key ways. Firstly, we use ``autospeculation,'' namely we build the speculation $ν$ out of the same oracle that defines~$μ$. In contrast, speculative decoding typically requires a separate, faster, but potentially less accurate ``draft'' model $ν$. Secondly, the key differentiating factor in our technique is that we make and accept speculations at a ``sequence'' level rather than at the level of single (or a few) steps. This last fact is key to unlocking our parallel runtime of $\widetilde{O}(n^{1/2})$.
Publisher OA PDF DOI
Trickle-Down in Localization Schemes and Applications
2024-06-10 · 2 citations
articleOpen access1st authorCorresponding
Trickle-down is a phenomenon in high-dimensional expanders with many important applications — for example, it is a key ingredient in various constructions of high-dimensional expanders or the proof of rapid mixing for the basis exchange walk on matroids and in the analysis of log-concave polynomials. We formulate a generalized trickle-down equation in the abstract context of linear-tilt localization schemes. Building on this generalization, we improve the best-known results for several Markov chain mixing or sampling problems — for example, we improve the threshold up to which Glauber dynamics is known to mix rapidly in the Sherrington-Kirkpatrick spin glass model. Other applications of our framework include near-linear time sampling algorithms from the antiferromagnetic Ising model and the fixed magnetization (antiferromagnetic or ferromagnetic) Ising model on expanders. For this application, we use a new dynamics inspired by polarization, a technique from the theory of stable polynomials.
Publisher OA PDF DOI
Optimal Sublinear Sampling of Spanning Trees and Determinantal Point Processes via Average-Case Entropic Independence
SIAM Journal on Computing · 2024-09-09
article1st authorCorresponding
Publisher DOI
Parallel Sampling via Counting
2024-06-10 · 2 citations
articleOpen access1st authorCorresponding
We show how to use parallelization to speed up sampling from an arbitrary distribution µ on a product space [q]n, given oracle access to counting queries: ℙX∼ µ[XS=σS] for any S⊆ [n] and σS ∈ [q]S. Our algorithm takes O(n2/3· polylog(n,q)) parallel time, to the best of our knowledge, the first sublinear in n runtime for arbitrary distributions. Our results have implications for sampling in autoregressive models. Our algorithm directly works with an equivalent oracle that answers conditional marginal queries ℙX∼ µ[Xi=σi | XS=σS], whose role is played by a trained neural network in autoregressive models. This suggests a roughly n1/3-factor speedup is possible for sampling in any-order autoregressive models. We complement our positive result by showing a lower bound of Ω(n1/3) for the runtime of any parallel sampling algorithm making at most poly(n) queries to the counting oracle, even for q=2.
Publisher OA PDF DOI
Batch Active Learning of Reward Functions from Human Preferences
arXiv (Cornell University) · 2024-02-24
preprintOpen access
Data generation and labeling are often expensive in robot learning. Preference-based learning is a concept that enables reliable labeling by querying users with preference questions. Active querying methods are commonly employed in preference-based learning to generate more informative data at the expense of parallelization and computation time. In this paper, we develop a set of novel algorithms, batch active preference-based learning methods, that enable efficient learning of reward functions using as few data samples as possible while still having short query generation times and also retaining parallelizability. We introduce a method based on determinantal point processes (DPP) for active batch generation and several heuristic-based alternatives. Finally, we present our experimental results for a variety of robotics tasks in simulation. Our results suggest that our batch active learning algorithm requires only a few queries that are computed in a short amount of time. We showcase one of our algorithms in a study to learn human users' preferences.
Publisher OA PDF DOI

Frequent coauthors

Shayan Oveis Gharan
44 shared
Thuy-Duong Vuong
Stanford University
26 shared
Amin Saberi
20 shared
Cynthia Vinzant
16 shared
Kuikui Liu
Massachusetts Institute of Technology
14 shared
Vijay V. Vazirani
University of California, Irvine
10 shared
Frederic Koehler
University of Chicago
9 shared
Dorsa Sadigh
DeepMind (United Kingdom)
8 shared

Labs

Vice Provost for Student AffairsPI

Resume-aware match score
Save to shortlist
AI-drafted outreach

See your match with Nima Anari

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

Join the waitlist How it works

Free to start
No credit card
30-second signup