
Aws Albarghouthi
· Associate ProfessorVerifiedUniversity of Wisconsin-Madison · Computer Sciences
Active 2010–2026
About
Aws Albarghouthi is an associate professor of computer science at the University of Wisconsin-Madison. His research focuses on the art and science of program synthesis and verification. He is a member of the madPL and quantum computing groups and the Wisconsin Quantum Institute. His group is primarily dedicated to automatically synthesizing the software stack for future quantum computers, addressing the need for a powerful, automated software infrastructure as physicists develop better qubits. This work involves pioneering new methods that diverge from classical computing software stacks to meet the demands of 21st-century quantum hardware. Albarghouthi's interests also extend to AI agent construction and correctness, where his team is working on building agents with guarantees of correctness and security, moving away from ad-hoc, limited-benchmark approaches. His contributions include work on compiler generation, quantum error correction, neural network verification, and the development of tools and benchmarks for deep research. He has been recognized for his leadership and contributions through keynote speeches, awards, and service roles in major conferences.
Research topics
- Artificial Intelligence
- Computer Science
- Mathematics
- Theoretical computer science
- Computer Security
- Programming language
- Algorithm
- Parallel computing
Selected publications
Linear-Time T-Gate Optimization via Random Abstraction
ArXiv.org · 2026-05-13
articleOpen access1st authorCorrespondingQuantum computers promise exponential speedups for problems in cryptography, chemistry, and optimization. Realizing this promise requires fault tolerance: physical qubits are noisy, so logical qubits must be encoded redundantly across many physical ones using quantum error-correcting codes. In most practical fault-tolerance schemes, T gates cannot be implemented transversally and instead require costly magic-state distillation protocols involving a complex set of operations. As a result, T-gate count can dominate the resource budget of large-scale quantum computations, making T-count minimization a central bottleneck on the path to quantum advantage. Existing T-count optimization tools, however, do not scale to the circuits that quantum advantage demands. We present theoretical and practical results on T-gate optimization. On the theoretical side, we give a linear-time randomized algorithm for phase folding, based on a novel randomized static analysis. Our static analysis soundly approximates the set of reachable quantum states with an arbitrarily high probability. Our key insight is a static analysis that does not track symbolic expressions, but propagates constant-width bitstrings down the circuit. On the practical side, our implementation, TZAP, is multiple orders of magnitude faster than state-of-the-art tools -- such as PyZX, VOQC, and Feynman -- closely matches their T-count reductions on standard benchmarks, and within seconds on a laptop computer can optimize circuits with millions of gates.
Linear-Time T-Gate Optimization via Random Abstraction
arXiv (Cornell University) · 2026-05-13
preprintOpen access1st authorCorrespondingQuantum computers promise exponential speedups for problems in cryptography, chemistry, and optimization. Realizing this promise requires fault tolerance: physical qubits are noisy, so logical qubits must be encoded redundantly across many physical ones using quantum error-correcting codes. In most practical fault-tolerance schemes, T gates cannot be implemented transversally and instead require costly magic-state distillation protocols involving a complex set of operations. As a result, T-gate count can dominate the resource budget of large-scale quantum computations, making T-count minimization a central bottleneck on the path to quantum advantage. Existing T-count optimization tools, however, do not scale to the circuits that quantum advantage demands. We present theoretical and practical results on T-gate optimization. On the theoretical side, we give a linear-time randomized algorithm for phase folding, based on a novel randomized static analysis. Our static analysis soundly approximates the set of reachable quantum states with an arbitrarily high probability. Our key insight is a static analysis that does not track symbolic expressions, but propagates constant-width bitstrings down the circuit. On the practical side, our implementation, TZAP, is multiple orders of magnitude faster than state-of-the-art tools -- such as PyZX, VOQC, and Feynman -- closely matches their T-count reductions on standard benchmarks, and within seconds on a laptop computer can optimize circuits with millions of gates.
SkillOrchestra: Learning to Route Agents via Skill Transfer
arXiv (Cornell University) · 2026-02-23
preprintOpen accessCompound AI systems promise capabilities beyond those of individual models, yet their success depends critically on effective orchestration. Existing routing approaches face two limitations: (1) input-level routers make coarse query-level decisions that ignore evolving task requirements; (2) RL-trained orchestrators are expensive to adapt and often suffer from routing collapse, repeatedly invoking one strong but costly option in multi-turn scenarios. We introduce SkillOrchestra, a framework for skill-aware orchestration. Instead of directly learning a routing policy end-to-end, SkillOrchestra learns fine-grained skills from execution experience and models agent-specific competence and cost under those skills. At deployment, the orchestrator infers the skill demands of the current interaction and selects agents that best satisfy them under an explicit performance-cost trade-off. Extensive experiments across ten benchmarks demonstrate that SkillOrchestra outperforms SoTA RL-based orchestrators by up to 22.5% with 700x and 300x learning cost reduction compared to Router-R1 and ToolOrchestra, respectively. These results show that explicit skill modeling enables scalable, interpretable, and sample-efficient orchestration, offering a principled alternative to data-intensive RL-based approaches. The code is available at: https://github.com/jiayuww/SkillOrchestra.
Test-Time Scaling Makes Overtraining Compute-Optimal
arXiv (Cornell University) · 2026-04-01
preprintOpen accessModern LLMs scale at test-time, e.g. via repeated sampling, where inference cost grows with model size and the number of samples. This creates a trade-off that pretraining scaling laws, such as Chinchilla, do not address. We present Train-to-Test ($T^2$) scaling laws that jointly optimize model size, training tokens, and number of inference samples under fixed end-to-end budgets. $T^2$ modernizes pretraining scaling laws with pass@$k$ modeling used for test-time scaling, then jointly optimizes pretraining and test-time decisions. Forecasts from $T^2$ are robust over distinct modeling approaches: measuring joint scaling effect on the task loss and modeling impact on task accuracy. Across eight downstream tasks, we find that when accounting for inference cost, optimal pretraining decisions shift radically into the overtraining regime, well-outside of the range of standard pretraining scaling suites. We validate our results by pretraining heavily overtrained models in the optimal region that $T^2$ scaling forecasts, confirming their substantially stronger performance compared to pretraining scaling alone. Finally, as frontier LLMs are post-trained, we show that our findings survive the post-training stage, making $T^2$ scaling meaningful in modern deployments.
SkillOrchestra: Learning to Route Agents via Skill Transfer
arXiv (Cornell University) · 2026-01-01
articleOpen accessCompound AI systems promise capabilities beyond those of individual models, yet their success depends critically on effective orchestration. Existing routing approaches face two limitations: (1) input-level routers make coarse query-level decisions that ignore evolving task requirements; (2) RL-trained orchestrators are expensive to adapt and often suffer from routing collapse, repeatedly invoking one strong but costly option in multi-turn scenarios. We introduce SkillOrchestra, a framework for skill-aware orchestration. Instead of directly learning a routing policy end-to-end, SkillOrchestra learns fine-grained skills from execution experience and models agent-specific competence and cost under those skills. At deployment, the orchestrator infers the skill demands of the current interaction and selects agents that best satisfy them under an explicit performance-cost trade-off. Extensive experiments across ten benchmarks demonstrate that SkillOrchestra outperforms SoTA RL-based orchestrators by up to 22.5% with 700x and 300x learning cost reduction compared to Router-R1 and ToolOrchestra, respectively. These results show that explicit skill modeling enables scalable, interpretable, and sample-efficient orchestration, offering a principled alternative to data-intensive RL-based approaches. The code is available at: https://github.com/jiayuww/SkillOrchestra.
Analyzing Decoders for Quantum Error Correction
arXiv (Cornell University) · 2026-03-20
preprintOpen accessSenior authorQuantum error correction (QEC) enables reliable computation on noisy hardware by encoding logical information across many physical qubits and periodically measuring parities to detect errors. A decoder is the classical algorithm that uses these measurements to infer which error most likely occurred, so that the system can correct it. The decoder's accuracy-how rarely it makes the wrong guess-directly determines the scale of quantum computation that can be reliably executed. With a wealth of competing decoding algorithms, a QEC system designer needs reliable methods to evaluate them. Today, the dominant approach is to evaluate decoders using Monte Carlo simulation. However, simulation has several drawbacks such as requiring many samples to produce low variance estimates. In this work, we develop a new systematic analysis for evaluating decoders. We introduce a novel formal semantics of a core language for QEC programs that captures the de facto standard Stim circuit format, providing a principled theoretical foundation for the emerging space of fault-tolerant quantum systems design. Given a QEC program and a decoder, our verifier can quantify both the decoder accuracy and the decoder robustness to drift in physical error rate. Our approach has two key components: (i) a structured search over the space of possible errors; and (ii) a constrained polynomial optimization kernel. A thorough empirical evaluation of our approach suggests that it can outperform simulation, especially in low error rate regimes, and that it can be deployed to quantify decoder robustness over an interval of physical error rates.
Generating Compilers for Qubit Mapping and Routing
Proceedings of the ACM on Programming Languages · 2026-01-08 · 1 citations
articleOpen accessSenior authorTo evaluate a quantum circuit on a quantum processor, one must find a mapping from circuit qubits to processor qubits and plan the instruction execution while satisfying the processor's constraints. This is known as the qubit mapping and routing (QMR) problem. High-quality QMR solutions are key to maximizing the utility of scarce quantum resources and minimizing the probability of logical errors affecting computation. The challenge is that the landscape of quantum processors is incredibly diverse and fast-evolving. Given this diversity, dozens of papers have addressed the QMR problem for different qubit hardware, connectivity constraints, and quantum error correction schemes by a developing a new algorithm for a particular context. We present an alternative approach: automatically generating qubit mapping and routing compilers for arbitrary quantum processors. Though each QMR problem is different, we identify a common core structure—device state machine—that we use to formulate an abstract QMR problem. Our formulation naturally leads to a compact domain-specific language for specifying QMR problems and a powerful parametric algorithm that can be instantiated for any QMR specification. Our thorough evaluation on case studies of important QMR problems shows that generated compilers are competitive with handwritten, specialized compilers in terms of runtime and solution quality.
Analyzing Decoders for Quantum Error Correction
ArXiv.org · 2026-03-20
articleOpen accessSenior authorQuantum error correction (QEC) enables reliable computation on noisy hardware by encoding logical information across many physical qubits and periodically measuring parities to detect errors. A decoder is the classical algorithm that uses these measurements to infer which error most likely occurred, so that the system can correct it. The decoder's accuracy-how rarely it makes the wrong guess-directly determines the scale of quantum computation that can be reliably executed. With a wealth of competing decoding algorithms, a QEC system designer needs reliable methods to evaluate them. Today, the dominant approach is to evaluate decoders using Monte Carlo simulation. However, simulation has several drawbacks such as requiring many samples to produce low variance estimates. In this work, we develop a new systematic analysis for evaluating decoders. We introduce a novel formal semantics of a core language for QEC programs that captures the de facto standard Stim circuit format, providing a principled theoretical foundation for the emerging space of fault-tolerant quantum systems design. Given a QEC program and a decoder, our verifier can quantify both the decoder accuracy and the decoder robustness to drift in physical error rate. Our approach has two key components: (i) a structured search over the space of possible errors; and (ii) a constrained polynomial optimization kernel. A thorough empirical evaluation of our approach suggests that it can outperform simulation, especially in low error rate regimes, and that it can be deployed to quantify decoder robustness over an interval of physical error rates.
Test-Time Scaling Makes Overtraining Compute-Optimal
ArXiv.org · 2026-04-01
articleOpen accessModern LLMs scale at test-time, e.g. via repeated sampling, where inference cost grows with model size and the number of samples. This creates a trade-off that pretraining scaling laws, such as Chinchilla, do not address. We present Train-to-Test ($T^2$) scaling laws that jointly optimize model size, training tokens, and number of inference samples under fixed end-to-end budgets. $T^2$ modernizes pretraining scaling laws with pass@$k$ modeling used for test-time scaling, then jointly optimizes pretraining and test-time decisions. Forecasts from $T^2$ are robust over distinct modeling approaches: measuring joint scaling effect on the task loss and modeling impact on task accuracy. Across eight downstream tasks, we find that when accounting for inference cost, optimal pretraining decisions shift radically into the overtraining regime, well-outside of the range of standard pretraining scaling suites. We validate our results by pretraining heavily overtrained models in the optimal region that $T^2$ scaling forecasts, confirming their substantially stronger performance compared to pretraining scaling alone. Finally, as frontier LLMs are post-trained, we show that our findings survive the post-training stage, making $T^2$ scaling meaningful in modern deployments.
A Case for Elastic Quantum Error Correction Decoders
2026-04-24
articleOpen accessLarge-scale quantum computers promise transformative speedups, but their viability hinges on fast and reliable quantum error correction (QEC). At the center of QEC are decoders—classical algorithms running on hardware such as FPGAs, GPUs, or CPUs that process error syndromes to detect errors every microsecond to preserve fault-tolerance. Quantum processors, therefore, operate not in isolation, but as accelerators tightly coupled with powerful classical digital hardware. A key challenge is that decoder demand fluctuates unpredictably: bursts of activity can require orders of magnitude more decodes than idle periods. Provisioning hardware for the worst case wastes resources, while provisioning for the average case risks catastrophic slowdowns. We show that this mismatch is a systems problem of capacity planning and scheduling, and propose a two-level framework that treats decoders as shared accelerators managed by the quantum operating system. Our approach reduces decoder requirements by 10–40% across fault-tolerant benchmarks, demonstrating that efficient decoder scheduling is essential to making FTQC practical.
Recent grants
CRII: SHF: Optimal Interpolation for Efficient Proof Synthesis
NSF · $175k · 2016–2020
SHF: Medium: Formal Methods for Program Fairness
NSF · $1.0M · 2017–2021
CAREER: Algorithmic Foundations and Modern Applications for Program Synthesis
NSF · $450k · 2017–2024
Frequent coauthors
- 29 shared
Loris D’Antoni
- 16 shared
Arie Gurfinkel
- 14 shared
Samuel Drews
University of Wisconsin–Madison
- 12 shared
Justin Hsu
- 10 shared
Goutham Ramakrishnan
- 9 shared
Somesh Jha
- 9 shared
Paraschos Koutris
University of Wisconsin–Madison
- 9 shared
Calvin Smith
University of Wisconsin–Madison
Labs
madPLPI
The home website for the UW-Madison Programming Languages research group: madPL
Awards & honors
- PLDI Keynote speaker (2026)
- PLDI Distinguished artifact award (2025)
- Amazon research award (2022)
- Class of 1955 Teaching Excellence Award (2022)
- SIGPLAN research highlight for PLDI (2021)
- Resume-aware match score
- Save to shortlist
- AI-drafted outreach
See your match with Aws Albarghouthi
PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.
- Free to start
- No credit card
- 30-second signup