Resume-aware faculty matching

Find professors who actually fit you

Upload your resume. Four AI agents analyze your background, rank the faculty who fit, inspect their recent research, and help you draft outreach — grounded in their actual work, not templates.

Free to startNo credit cardCancel anytime
Top matches Balanced preset
Dr. Sarah Chen
Stanford · Interpretability · NLP
91
Dr. Marcus Holloway
MIT · Robotics · RL
84
Dr. Aisha Okonkwo
CMU · Fairness · HCI
82
Nova · Professor Researcher · re-ranking top 20…
Aws Albarghouthi

Aws Albarghouthi

· Associate ProfessorVerified

University of Wisconsin-Madison · Computer Sciences

Active 2010–2026

h-index22
Citations1.8k
Papers11249 last 5y
Funding$1.6M
See your match with Aws Albarghouthi — sign in to PhdFit.Sign in

About

Aws Albarghouthi is an associate professor of computer science at the University of Wisconsin-Madison. His research focuses on the art and science of program synthesis and verification. He is a member of the madPL and quantum computing groups and the Wisconsin Quantum Institute. His group is primarily dedicated to automatically synthesizing the software stack for future quantum computers, addressing the need for a powerful, automated software infrastructure as physicists develop better qubits. This work involves pioneering new methods that diverge from classical computing software stacks to meet the demands of 21st-century quantum hardware. Albarghouthi's interests also extend to AI agent construction and correctness, where his team is working on building agents with guarantees of correctness and security, moving away from ad-hoc, limited-benchmark approaches. His contributions include work on compiler generation, quantum error correction, neural network verification, and the development of tools and benchmarks for deep research. He has been recognized for his leadership and contributions through keynote speeches, awards, and service roles in major conferences.

Research topics

  • Artificial Intelligence
  • Computer Science
  • Mathematics
  • Theoretical computer science
  • Computer Security
  • Programming language
  • Algorithm
  • Parallel computing

Selected publications

  • Linear-Time T-Gate Optimization via Random Abstraction

    ArXiv.org · 2026-05-13

    articleOpen access1st authorCorresponding

    Quantum computers promise exponential speedups for problems in cryptography, chemistry, and optimization. Realizing this promise requires fault tolerance: physical qubits are noisy, so logical qubits must be encoded redundantly across many physical ones using quantum error-correcting codes. In most practical fault-tolerance schemes, T gates cannot be implemented transversally and instead require costly magic-state distillation protocols involving a complex set of operations. As a result, T-gate count can dominate the resource budget of large-scale quantum computations, making T-count minimization a central bottleneck on the path to quantum advantage. Existing T-count optimization tools, however, do not scale to the circuits that quantum advantage demands. We present theoretical and practical results on T-gate optimization. On the theoretical side, we give a linear-time randomized algorithm for phase folding, based on a novel randomized static analysis. Our static analysis soundly approximates the set of reachable quantum states with an arbitrarily high probability. Our key insight is a static analysis that does not track symbolic expressions, but propagates constant-width bitstrings down the circuit. On the practical side, our implementation, TZAP, is multiple orders of magnitude faster than state-of-the-art tools -- such as PyZX, VOQC, and Feynman -- closely matches their T-count reductions on standard benchmarks, and within seconds on a laptop computer can optimize circuits with millions of gates.

  • Linear-Time T-Gate Optimization via Random Abstraction

    arXiv (Cornell University) · 2026-05-13

    preprintOpen access1st authorCorresponding

    Quantum computers promise exponential speedups for problems in cryptography, chemistry, and optimization. Realizing this promise requires fault tolerance: physical qubits are noisy, so logical qubits must be encoded redundantly across many physical ones using quantum error-correcting codes. In most practical fault-tolerance schemes, T gates cannot be implemented transversally and instead require costly magic-state distillation protocols involving a complex set of operations. As a result, T-gate count can dominate the resource budget of large-scale quantum computations, making T-count minimization a central bottleneck on the path to quantum advantage. Existing T-count optimization tools, however, do not scale to the circuits that quantum advantage demands. We present theoretical and practical results on T-gate optimization. On the theoretical side, we give a linear-time randomized algorithm for phase folding, based on a novel randomized static analysis. Our static analysis soundly approximates the set of reachable quantum states with an arbitrarily high probability. Our key insight is a static analysis that does not track symbolic expressions, but propagates constant-width bitstrings down the circuit. On the practical side, our implementation, TZAP, is multiple orders of magnitude faster than state-of-the-art tools -- such as PyZX, VOQC, and Feynman -- closely matches their T-count reductions on standard benchmarks, and within seconds on a laptop computer can optimize circuits with millions of gates.

  • SkillOrchestra: Learning to Route Agents via Skill Transfer

    arXiv (Cornell University) · 2026-02-23

    preprintOpen access

    Compound AI systems promise capabilities beyond those of individual models, yet their success depends critically on effective orchestration. Existing routing approaches face two limitations: (1) input-level routers make coarse query-level decisions that ignore evolving task requirements; (2) RL-trained orchestrators are expensive to adapt and often suffer from routing collapse, repeatedly invoking one strong but costly option in multi-turn scenarios. We introduce SkillOrchestra, a framework for skill-aware orchestration. Instead of directly learning a routing policy end-to-end, SkillOrchestra learns fine-grained skills from execution experience and models agent-specific competence and cost under those skills. At deployment, the orchestrator infers the skill demands of the current interaction and selects agents that best satisfy them under an explicit performance-cost trade-off. Extensive experiments across ten benchmarks demonstrate that SkillOrchestra outperforms SoTA RL-based orchestrators by up to 22.5% with 700x and 300x learning cost reduction compared to Router-R1 and ToolOrchestra, respectively. These results show that explicit skill modeling enables scalable, interpretable, and sample-efficient orchestration, offering a principled alternative to data-intensive RL-based approaches. The code is available at: https://github.com/jiayuww/SkillOrchestra.

  • Test-Time Scaling Makes Overtraining Compute-Optimal

    arXiv (Cornell University) · 2026-04-01

    preprintOpen access

    Modern LLMs scale at test-time, e.g. via repeated sampling, where inference cost grows with model size and the number of samples. This creates a trade-off that pretraining scaling laws, such as Chinchilla, do not address. We present Train-to-Test ($T^2$) scaling laws that jointly optimize model size, training tokens, and number of inference samples under fixed end-to-end budgets. $T^2$ modernizes pretraining scaling laws with pass@$k$ modeling used for test-time scaling, then jointly optimizes pretraining and test-time decisions. Forecasts from $T^2$ are robust over distinct modeling approaches: measuring joint scaling effect on the task loss and modeling impact on task accuracy. Across eight downstream tasks, we find that when accounting for inference cost, optimal pretraining decisions shift radically into the overtraining regime, well-outside of the range of standard pretraining scaling suites. We validate our results by pretraining heavily overtrained models in the optimal region that $T^2$ scaling forecasts, confirming their substantially stronger performance compared to pretraining scaling alone. Finally, as frontier LLMs are post-trained, we show that our findings survive the post-training stage, making $T^2$ scaling meaningful in modern deployments.

  • SkillOrchestra: Learning to Route Agents via Skill Transfer

    arXiv (Cornell University) · 2026-01-01

    articleOpen access

    Compound AI systems promise capabilities beyond those of individual models, yet their success depends critically on effective orchestration. Existing routing approaches face two limitations: (1) input-level routers make coarse query-level decisions that ignore evolving task requirements; (2) RL-trained orchestrators are expensive to adapt and often suffer from routing collapse, repeatedly invoking one strong but costly option in multi-turn scenarios. We introduce SkillOrchestra, a framework for skill-aware orchestration. Instead of directly learning a routing policy end-to-end, SkillOrchestra learns fine-grained skills from execution experience and models agent-specific competence and cost under those skills. At deployment, the orchestrator infers the skill demands of the current interaction and selects agents that best satisfy them under an explicit performance-cost trade-off. Extensive experiments across ten benchmarks demonstrate that SkillOrchestra outperforms SoTA RL-based orchestrators by up to 22.5% with 700x and 300x learning cost reduction compared to Router-R1 and ToolOrchestra, respectively. These results show that explicit skill modeling enables scalable, interpretable, and sample-efficient orchestration, offering a principled alternative to data-intensive RL-based approaches. The code is available at: https://github.com/jiayuww/SkillOrchestra.

  • Analyzing Decoders for Quantum Error Correction

    arXiv (Cornell University) · 2026-03-20

    preprintOpen accessSenior author

    Quantum error correction (QEC) enables reliable computation on noisy hardware by encoding logical information across many physical qubits and periodically measuring parities to detect errors. A decoder is the classical algorithm that uses these measurements to infer which error most likely occurred, so that the system can correct it. The decoder's accuracy-how rarely it makes the wrong guess-directly determines the scale of quantum computation that can be reliably executed. With a wealth of competing decoding algorithms, a QEC system designer needs reliable methods to evaluate them. Today, the dominant approach is to evaluate decoders using Monte Carlo simulation. However, simulation has several drawbacks such as requiring many samples to produce low variance estimates. In this work, we develop a new systematic analysis for evaluating decoders. We introduce a novel formal semantics of a core language for QEC programs that captures the de facto standard Stim circuit format, providing a principled theoretical foundation for the emerging space of fault-tolerant quantum systems design. Given a QEC program and a decoder, our verifier can quantify both the decoder accuracy and the decoder robustness to drift in physical error rate. Our approach has two key components: (i) a structured search over the space of possible errors; and (ii) a constrained polynomial optimization kernel. A thorough empirical evaluation of our approach suggests that it can outperform simulation, especially in low error rate regimes, and that it can be deployed to quantify decoder robustness over an interval of physical error rates.

  • Generating Compilers for Qubit Mapping and Routing

    Proceedings of the ACM on Programming Languages · 2026-01-08 · 1 citations

    articleOpen accessSenior author

    To evaluate a quantum circuit on a quantum processor, one must find a mapping from circuit qubits to processor qubits and plan the instruction execution while satisfying the processor's constraints. This is known as the qubit mapping and routing (QMR) problem. High-quality QMR solutions are key to maximizing the utility of scarce quantum resources and minimizing the probability of logical errors affecting computation. The challenge is that the landscape of quantum processors is incredibly diverse and fast-evolving. Given this diversity, dozens of papers have addressed the QMR problem for different qubit hardware, connectivity constraints, and quantum error correction schemes by a developing a new algorithm for a particular context. We present an alternative approach: automatically generating qubit mapping and routing compilers for arbitrary quantum processors. Though each QMR problem is different, we identify a common core structure—device state machine—that we use to formulate an abstract QMR problem. Our formulation naturally leads to a compact domain-specific language for specifying QMR problems and a powerful parametric algorithm that can be instantiated for any QMR specification. Our thorough evaluation on case studies of important QMR problems shows that generated compilers are competitive with handwritten, specialized compilers in terms of runtime and solution quality.

  • Analyzing Decoders for Quantum Error Correction

    ArXiv.org · 2026-03-20

    articleOpen accessSenior author

    Quantum error correction (QEC) enables reliable computation on noisy hardware by encoding logical information across many physical qubits and periodically measuring parities to detect errors. A decoder is the classical algorithm that uses these measurements to infer which error most likely occurred, so that the system can correct it. The decoder's accuracy-how rarely it makes the wrong guess-directly determines the scale of quantum computation that can be reliably executed. With a wealth of competing decoding algorithms, a QEC system designer needs reliable methods to evaluate them. Today, the dominant approach is to evaluate decoders using Monte Carlo simulation. However, simulation has several drawbacks such as requiring many samples to produce low variance estimates. In this work, we develop a new systematic analysis for evaluating decoders. We introduce a novel formal semantics of a core language for QEC programs that captures the de facto standard Stim circuit format, providing a principled theoretical foundation for the emerging space of fault-tolerant quantum systems design. Given a QEC program and a decoder, our verifier can quantify both the decoder accuracy and the decoder robustness to drift in physical error rate. Our approach has two key components: (i) a structured search over the space of possible errors; and (ii) a constrained polynomial optimization kernel. A thorough empirical evaluation of our approach suggests that it can outperform simulation, especially in low error rate regimes, and that it can be deployed to quantify decoder robustness over an interval of physical error rates.

  • Test-Time Scaling Makes Overtraining Compute-Optimal

    ArXiv.org · 2026-04-01

    articleOpen access

    Modern LLMs scale at test-time, e.g. via repeated sampling, where inference cost grows with model size and the number of samples. This creates a trade-off that pretraining scaling laws, such as Chinchilla, do not address. We present Train-to-Test ($T^2$) scaling laws that jointly optimize model size, training tokens, and number of inference samples under fixed end-to-end budgets. $T^2$ modernizes pretraining scaling laws with pass@$k$ modeling used for test-time scaling, then jointly optimizes pretraining and test-time decisions. Forecasts from $T^2$ are robust over distinct modeling approaches: measuring joint scaling effect on the task loss and modeling impact on task accuracy. Across eight downstream tasks, we find that when accounting for inference cost, optimal pretraining decisions shift radically into the overtraining regime, well-outside of the range of standard pretraining scaling suites. We validate our results by pretraining heavily overtrained models in the optimal region that $T^2$ scaling forecasts, confirming their substantially stronger performance compared to pretraining scaling alone. Finally, as frontier LLMs are post-trained, we show that our findings survive the post-training stage, making $T^2$ scaling meaningful in modern deployments.

  • A Case for Elastic Quantum Error Correction Decoders

    2026-04-24

    articleOpen access

    Large-scale quantum computers promise transformative speedups, but their viability hinges on fast and reliable quantum error correction (QEC). At the center of QEC are decoders—classical algorithms running on hardware such as FPGAs, GPUs, or CPUs that process error syndromes to detect errors every microsecond to preserve fault-tolerance. Quantum processors, therefore, operate not in isolation, but as accelerators tightly coupled with powerful classical digital hardware. A key challenge is that decoder demand fluctuates unpredictably: bursts of activity can require orders of magnitude more decodes than idle periods. Provisioning hardware for the worst case wastes resources, while provisioning for the average case risks catastrophic slowdowns. We show that this mismatch is a systems problem of capacity planning and scheduling, and propose a two-level framework that treats decoders as shared accelerators managed by the quantum operating system. Our approach reduces decoder requirements by 10–40% across fault-tolerant benchmarks, demonstrating that efficient decoder scheduling is essential to making FTQC practical.

Recent grants

Frequent coauthors

  • Loris D’Antoni

    29 shared
  • Arie Gurfinkel

    16 shared
  • Samuel Drews

    University of Wisconsin–Madison

    14 shared
  • Justin Hsu

    12 shared
  • Goutham Ramakrishnan

    10 shared
  • Somesh Jha

    9 shared
  • Paraschos Koutris

    University of Wisconsin–Madison

    9 shared
  • Calvin Smith

    University of Wisconsin–Madison

    9 shared

Labs

  • madPLPI

    The home website for the UW-Madison Programming Languages research group: madPL

Awards & honors

  • PLDI Keynote speaker (2026)
  • PLDI Distinguished artifact award (2025)
  • Amazon research award (2022)
  • Class of 1955 Teaching Excellence Award (2022)
  • SIGPLAN research highlight for PLDI (2021)
  • Resume-aware match score
  • Save to shortlist
  • AI-drafted outreach

See your match with Aws Albarghouthi

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

  • Free to start
  • No credit card
  • 30-second signup