Aws Albarghouthi

· Associate ProfessorVerified

University of Wisconsin-Madison · Computer Sciences

Active 2010–2026

h-index22

Citations1.8k

Papers11249 last 5y

Funding$1.6M

Faculty page Lab page

See your match with Aws Albarghouthi — sign in to PhdFit.Sign in

About

Aws Albarghouthi is an associate professor of computer science at the University of Wisconsin-Madison. His research focuses on the art and science of program synthesis and verification. He is a member of the madPL and quantum computing groups and the Wisconsin Quantum Institute. His group is primarily dedicated to automatically synthesizing the software stack for future quantum computers, addressing the need for a powerful, automated software infrastructure as physicists develop better qubits. This work involves pioneering new methods that diverge from classical computing software stacks to meet the demands of 21st-century quantum hardware. Albarghouthi's interests also extend to AI agent construction and correctness, where his team is working on building agents with guarantees of correctness and security, moving away from ad-hoc, limited-benchmark approaches. His contributions include work on compiler generation, quantum error correction, neural network verification, and the development of tools and benchmarks for deep research. He has been recognized for his leadership and contributions through keynote speeches, awards, and service roles in major conferences.

Research topics

Artificial Intelligence
Computer Science
Mathematics
Theoretical computer science
Computer Security
Programming language
Algorithm
Parallel computing

Selected publications

Linear-Time T-Gate Optimization via Random Abstraction
ArXiv.org · 2026-05-13
articleOpen access1st authorCorresponding
Quantum computers promise exponential speedups for problems in cryptography, chemistry, and optimization. Realizing this promise requires fault tolerance: physical qubits are noisy, so logical qubits must be encoded redundantly across many physical ones using quantum error-correcting codes. In most practical fault-tolerance schemes, T gates cannot be implemented transversally and instead require costly magic-state distillation protocols involving a complex set of operations. As a result, T-gate count can dominate the resource budget of large-scale quantum computations, making T-count minimization a central bottleneck on the path to quantum advantage. Existing T-count optimization tools, however, do not scale to the circuits that quantum advantage demands. We present theoretical and practical results on T-gate optimization. On the theoretical side, we give a linear-time randomized algorithm for phase folding, based on a novel randomized static analysis. Our static analysis soundly approximates the set of reachable quantum states with an arbitrarily high probability. Our key insight is a static analysis that does not track symbolic expressions, but propagates constant-width bitstrings down the circuit. On the practical side, our implementation, TZAP, is multiple orders of magnitude faster than state-of-the-art tools -- such as PyZX, VOQC, and Feynman -- closely matches their T-count reductions on standard benchmarks, and within seconds on a laptop computer can optimize circuits with millions of gates.
Publisher OA PDF
Linear-Time T-Gate Optimization via Random Abstraction
arXiv (Cornell University) · 2026-05-13
preprintOpen access1st authorCorresponding
Quantum computers promise exponential speedups for problems in cryptography, chemistry, and optimization. Realizing this promise requires fault tolerance: physical qubits are noisy, so logical qubits must be encoded redundantly across many physical ones using quantum error-correcting codes. In most practical fault-tolerance schemes, T gates cannot be implemented transversally and instead require costly magic-state distillation protocols involving a complex set of operations. As a result, T-gate count can dominate the resource budget of large-scale quantum computations, making T-count minimization a central bottleneck on the path to quantum advantage. Existing T-count optimization tools, however, do not scale to the circuits that quantum advantage demands. We present theoretical and practical results on T-gate optimization. On the theoretical side, we give a linear-time randomized algorithm for phase folding, based on a novel randomized static analysis. Our static analysis soundly approximates the set of reachable quantum states with an arbitrarily high probability. Our key insight is a static analysis that does not track symbolic expressions, but propagates constant-width bitstrings down the circuit. On the practical side, our implementation, TZAP, is multiple orders of magnitude faster than state-of-the-art tools -- such as PyZX, VOQC, and Feynman -- closely matches their T-count reductions on standard benchmarks, and within seconds on a laptop computer can optimize circuits with millions of gates.
Publisher DOI
SkillOrchestra: Learning to Route Agents via Skill Transfer
arXiv (Cornell University) · 2026-02-23
preprintOpen access
Compound AI systems promise capabilities beyond those of individual models, yet their success depends critically on effective orchestration. Existing routing approaches face two limitations: (1) input-level routers make coarse query-level decisions that ignore evolving task requirements; (2) RL-trained orchestrators are expensive to adapt and often suffer from routing collapse, repeatedly invoking one strong but costly option in multi-turn scenarios. We introduce SkillOrchestra, a framework for skill-aware orchestration. Instead of directly learning a routing policy end-to-end, SkillOrchestra learns fine-grained skills from execution experience and models agent-specific competence and cost under those skills. At deployment, the orchestrator infers the skill demands of the current interaction and selects agents that best satisfy them under an explicit performance-cost trade-off. Extensive experiments across ten benchmarks demonstrate that SkillOrchestra outperforms SoTA RL-based orchestrators by up to 22.5% with 700x and 300x learning cost reduction compared to Router-R1 and ToolOrchestra, respectively. These results show that explicit skill modeling enables scalable, interpretable, and sample-efficient orchestration, offering a principled alternative to data-intensive RL-based approaches. The code is available at: https://github.com/jiayuww/SkillOrchestra.
Publisher DOI
Test-Time Scaling Makes Overtraining Compute-Optimal
arXiv (Cornell University) · 2026-04-01
preprintOpen access
Modern LLMs scale at test-time, e.g. via repeated sampling, where inference cost grows with model size and the number of samples. This creates a trade-off that pretraining scaling laws, such as Chinchilla, do not address. We present Train-to-Test ($T^2$) scaling laws that jointly optimize model size, training tokens, and number of inference samples under fixed end-to-end budgets. $T^2$ modernizes pretraining scaling laws with pass@$k$ modeling used for test-time scaling, then jointly optimizes pretraining and test-time decisions. Forecasts from $T^2$ are robust over distinct modeling approaches: measuring joint scaling effect on the task loss and modeling impact on task accuracy. Across eight downstream tasks, we find that when accounting for inference cost, optimal pretraining decisions shift radically into the overtraining regime, well-outside of the range of standard pretraining scaling suites. We validate our results by pretraining heavily overtrained models in the optimal region that $T^2$ scaling forecasts, confirming their substantially stronger performance compared to pretraining scaling alone. Finally, as frontier LLMs are post-trained, we show that our findings survive the post-training stage, making $T^2$ scaling meaningful in modern deployments.
Publisher DOI
SkillOrchestra: Learning to Route Agents via Skill Transfer
arXiv (Cornell University) · 2026-01-01
articleOpen access
Compound AI systems promise capabilities beyond those of individual models, yet their success depends critically on effective orchestration. Existing routing approaches face two limitations: (1) input-level routers make coarse query-level decisions that ignore evolving task requirements; (2) RL-trained orchestrators are expensive to adapt and often suffer from routing collapse, repeatedly invoking one strong but costly option in multi-turn scenarios. We introduce SkillOrchestra, a framework for skill-aware orchestration. Instead of directly learning a routing policy end-to-end, SkillOrchestra learns fine-grained skills from execution experience and models agent-specific competence and cost under those skills. At deployment, the orchestrator infers the skill demands of the current interaction and selects agents that best satisfy them under an explicit performance-cost trade-off. Extensive experiments across ten benchmarks demonstrate that SkillOrchestra outperforms SoTA RL-based orchestrators by up to 22.5% with 700x and 300x learning cost reduction compared to Router-R1 and ToolOrchestra, respectively. These results show that explicit skill modeling enables scalable, interpretable, and sample-efficient orchestration, offering a principled alternative to data-intensive RL-based approaches. The code is available at: https://github.com/jiayuww/SkillOrchestra.
Publisher OA PDF
Analyzing Decoders for Quantum Error Correction
arXiv (Cornell University) · 2026-03-20
preprintOpen accessSenior author
Quantum error correction (QEC) enables reliable computation on noisy hardware by encoding logical information across many physical qubits and periodically measuring parities to detect errors. A decoder is the classical algorithm that uses these measurements to infer which error most likely occurred, so that the system can correct it. The decoder's accuracy-how rarely it makes the wrong guess-directly determines the scale of quantum computation that can be reliably executed. With a wealth of competing decoding algorithms, a QEC system designer needs reliable methods to evaluate them. Today, the dominant approach is to evaluate decoders using Monte Carlo simulation. However, simulation has several drawbacks such as requiring many samples to produce low variance estimates. In this work, we develop a new systematic analysis for evaluating decoders. We introduce a novel formal semantics of a core language for QEC programs that captures the de facto standard Stim circuit format, providing a principled theoretical foundation for the emerging space of fault-tolerant quantum systems design. Given a QEC program and a decoder, our verifier can quantify both the decoder accuracy and the decoder robustness to drift in physical error rate. Our approach has two key components: (i) a structured search over the space of possible errors; and (ii) a constrained polynomial optimization kernel. A thorough empirical evaluation of our approach suggests that it can outperform simulation, especially in low error rate regimes, and that it can be deployed to quantify decoder robustness over an interval of physical error rates.
Publisher DOI
Generating Compilers for Qubit Mapping and Routing
Proceedings of the ACM on Programming Languages · 2026-01-08 · 1 citations
articleOpen accessSenior author
To evaluate a quantum circuit on a quantum processor, one must find a mapping from circuit qubits to processor qubits and plan the instruction execution while satisfying the processor's constraints. This is known as the qubit mapping and routing (QMR) problem. High-quality QMR solutions are key to maximizing the utility of scarce quantum resources and minimizing the probability of logical errors affecting computation. The challenge is that the landscape of quantum processors is incredibly diverse and fast-evolving. Given this diversity, dozens of papers have addressed the QMR problem for different qubit hardware, connectivity constraints, and quantum error correction schemes by a developing a new algorithm for a particular context. We present an alternative approach: automatically generating qubit mapping and routing compilers for arbitrary quantum processors. Though each QMR problem is different, we identify a common core structure—device state machine—that we use to formulate an abstract QMR problem. Our formulation naturally leads to a compact domain-specific language for specifying QMR problems and a powerful parametric algorithm that can be instantiated for any QMR specification. Our thorough evaluation on case studies of important QMR problems shows that generated compilers are competitive with handwritten, specialized compilers in terms of runtime and solution quality.
Publisher DOI
Analyzing Decoders for Quantum Error Correction
ArXiv.org · 2026-03-20
articleOpen accessSenior author
Quantum error correction (QEC) enables reliable computation on noisy hardware by encoding logical information across many physical qubits and periodically measuring parities to detect errors. A decoder is the classical algorithm that uses these measurements to infer which error most likely occurred, so that the system can correct it. The decoder's accuracy-how rarely it makes the wrong guess-directly determines the scale of quantum computation that can be reliably executed. With a wealth of competing decoding algorithms, a QEC system designer needs reliable methods to evaluate them. Today, the dominant approach is to evaluate decoders using Monte Carlo simulation. However, simulation has several drawbacks such as requiring many samples to produce low variance estimates. In this work, we develop a new systematic analysis for evaluating decoders. We introduce a novel formal semantics of a core language for QEC programs that captures the de facto standard Stim circuit format, providing a principled theoretical foundation for the emerging space of fault-tolerant quantum systems design. Given a QEC program and a decoder, our verifier can quantify both the decoder accuracy and the decoder robustness to drift in physical error rate. Our approach has two key components: (i) a structured search over the space of possible errors; and (ii) a constrained polynomial optimization kernel. A thorough empirical evaluation of our approach suggests that it can outperform simulation, especially in low error rate regimes, and that it can be deployed to quantify decoder robustness over an interval of physical error rates.
Publisher OA PDF
Test-Time Scaling Makes Overtraining Compute-Optimal
ArXiv.org · 2026-04-01
articleOpen access
Modern LLMs scale at test-time, e.g. via repeated sampling, where inference cost grows with model size and the number of samples. This creates a trade-off that pretraining scaling laws, such as Chinchilla, do not address. We present Train-to-Test ($T^2$) scaling laws that jointly optimize model size, training tokens, and number of inference samples under fixed end-to-end budgets. $T^2$ modernizes pretraining scaling laws with pass@$k$ modeling used for test-time scaling, then jointly optimizes pretraining and test-time decisions. Forecasts from $T^2$ are robust over distinct modeling approaches: measuring joint scaling effect on the task loss and modeling impact on task accuracy. Across eight downstream tasks, we find that when accounting for inference cost, optimal pretraining decisions shift radically into the overtraining regime, well-outside of the range of standard pretraining scaling suites. We validate our results by pretraining heavily overtrained models in the optimal region that $T^2$ scaling forecasts, confirming their substantially stronger performance compared to pretraining scaling alone. Finally, as frontier LLMs are post-trained, we show that our findings survive the post-training stage, making $T^2$ scaling meaningful in modern deployments.
Publisher OA PDF
A Case for Elastic Quantum Error Correction Decoders
2026-04-24
articleOpen access
Large-scale quantum computers promise transformative speedups, but their viability hinges on fast and reliable quantum error correction (QEC). At the center of QEC are decoders—classical algorithms running on hardware such as FPGAs, GPUs, or CPUs that process error syndromes to detect errors every microsecond to preserve fault-tolerance. Quantum processors, therefore, operate not in isolation, but as accelerators tightly coupled with powerful classical digital hardware. A key challenge is that decoder demand fluctuates unpredictably: bursts of activity can require orders of magnitude more decodes than idle periods. Provisioning hardware for the worst case wastes resources, while provisioning for the average case risks catastrophic slowdowns. We show that this mismatch is a systems problem of capacity planning and scheduling, and propose a two-level framework that treats decoders as shared accelerators managed by the quantum operating system. Our approach reduces decoder requirements by 10–40% across fault-tolerant benchmarks, demonstrating that efficient decoder scheduling is essential to making FTQC practical.
Publisher DOI

Recent grants

CRII: SHF: Optimal Interpolation for Efficient Proof Synthesis
NSF · $175k · 2016–2020
SHF: Medium: Formal Methods for Program Fairness
NSF · $1.0M · 2017–2021
CAREER: Algorithmic Foundations and Modern Applications for Program Synthesis
NSF · $450k · 2017–2024

Frequent coauthors

Loris D’Antoni
29 shared
Arie Gurfinkel
16 shared
Samuel Drews
University of Wisconsin–Madison
14 shared
Justin Hsu
12 shared
Goutham Ramakrishnan
10 shared
Somesh Jha
9 shared
Paraschos Koutris
University of Wisconsin–Madison
9 shared
Calvin Smith
University of Wisconsin–Madison
9 shared

Labs

madPLPI
The home website for the UW-Madison Programming Languages research group: madPL

Awards & honors

PLDI Keynote speaker (2026)
PLDI Distinguished artifact award (2025)
Amazon research award (2022)
Class of 1955 Teaching Excellence Award (2022)
SIGPLAN research highlight for PLDI (2021)

Resume-aware match score
Save to shortlist
AI-drafted outreach

See your match with Aws Albarghouthi

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

Join the waitlist How it works

Free to start
No credit card
30-second signup

Find professors who actually fit you