
Barbara Chapman
· ProfessorVerifiedStony Brook University · Computer Science
Active 1990–2026
About
Dr. Barbara Chapman is a Professor of Applied Mathematics and Statistics, and of Computer Science at Stony Brook University, where she is affiliated with the Institute for Advanced Computational Science. She also directs Computer Science and Mathematics Research at Brookhaven National Laboratory. Her research has focused on parallel programming interfaces and implementation technology for over 20 years, with significant efforts in developing community standards for parallel programming, including OpenMP, OpenACC, and OpenSHMEM. Her research group created the state-of-the-art OpenUH compiler, enabling practical experimentation with parallel language extensions and implementation techniques, as well as a reference implementation of the library-based OpenSHMEM programming interface. Dr. Chapman has co-authored over 200 papers and two books. Her current research primarily concentrates on OpenMP, an industry standard for shared memory parallel programming that has been broadly accepted by the computing community.
Research topics
- Computer Science
- Computer architecture
- Operating system
- Programming language
- Parallel computing
- Machine Learning
- Artificial Intelligence
- Computer engineering
- Software engineering
Selected publications
An OGSI-compliant portal for campus grids
2026-02-11
article1st authorCorrespondingCampus grids are virtual organizations consisting of resources owned by various labs and departments on a university campus that may be collectively put to use. Several research institutions including our own are early adopters of this technology. We describe the structure of our campus grid and introduce the EZ-Grid software that has been developed to enable high-level access to its services. EZ-Grid is currently undergoing revision to ensure that it will comply with emerging standards for grid services. We outline these new standards, and explain how we are working to meet them.
A Full Stack Framework for High Performance Quantum-Classical Computing
ArXiv.org · 2025-10-23
preprintOpen accessTo address the growing needs for scalable High Performance Computing (HPC) and Quantum Computing (QC) integration, we present our HPC-QC full stack framework and its hybrid workload development capability with modular hardware/device-agnostic software integration approach. The latest development in extensible interfaces for quantum programming, dispatching, and compilation within existing mature HPC programming environment are demonstrated. Our HPC-QC full stack enables high-level, portable invocation of quantum kernels from commercial quantum SDKs within HPC meta-program in compiled languages (C/C++ and Fortran) as well as Python through a quantum programming interface library extension. An adaptive circuit knitting hypervisor is being developed to partition large quantum circuits into sub-circuits that fit on smaller noisy quantum devices and classical simulators. At the lower-level, we leverage Cray LLVM-based compilation framework to transform and consume LLVM IR and Quantum IR (QIR) from commercial quantum software frontends in a retargetable fashion to different hardware architectures. Several hybrid HPC-QC multi-node multi-CPU and GPU workloads (including solving linear system of equations, quantum optimization, and simulating quantum phase transitions) have been demonstrated on HPE EX supercomputers to illustrate functionality and execution viability for all three components developed so far. This work provides the framework for a unified quantum-classical programming environment built upon classical HPC software stack (compilers, libraries, parallel runtime and process scheduling).
DiOMP-Offloading: Toward Portable Distributed Heterogeneous OpenMP
2025-11-07 · 1 citations
articleSenior authorHigh-performance computing faces rising core counts, increasing heterogeneity, and growing memory bandwidth. These trends complicate programmability, portability, and scalability, while traditional MPI + OpenMP struggles with distributed GPU memory and portable performance. We present DiOMP-Offloading, a framework unifying OpenMP target offloading with a Partitioned Global Address Space (PGAS) model. Built on LLVM-OpenMP and GASNet-EX, it centrally manages global memory and supports symmetric/asymmetric GPU allocations, enabling remote put/get operations. DiOMP also integrates OMPCCL, a portable device-side collective layer that harmonizes allocation lifecycles and address translation across vendor backends. By eliminating separate MPI + X stacks and abstracting replicated device memory and communication logic, DiOMP improves scalability and programmability. Experiments on large-scale NVIDIA A100, Grace Hopper, and AMD MI250X platforms show superior micro-benchmark and application performance, demonstrating that DiOMP-Offloading offers a more portable, scalable, and efficient path for heterogeneous supercomputing.
Discussion of Device-Device Collective Communication in OpenMP Target Offloading
Lecture notes in computer science · 2025-09-28
book-chapterSenior authorDiOMP-Offloading: Toward Portable Distributed Heterogeneous OpenMP
arXiv (Cornell University) · 2025-06-03
preprintOpen accessSenior authorAs core counts and heterogeneity rise in HPC, traditional hybrid programming models face challenges in managing distributed GPU memory and ensuring portability. This paper presents DiOMP, a distributed OpenMP framework that unifies OpenMP target offloading with the Partitioned Global Address Space (PGAS) model. Built atop LLVM/OpenMP and using GASNet-EX or GPI-2 for communication, DiOMP transparently handles global memory, supporting both symmetric and asymmetric GPU allocations. It leverages OMPCCL, a portable collective communication layer compatible with vendor libraries. DiOMP simplifies programming by abstracting device memory and communication, achieving superior scalability and programmability over traditional approaches. Evaluations on NVIDIA A100, Grace Hopper, and AMD MI250X show improved performance in micro-benchmarks and applications like matrix multiplication and Minimod, highlighting DiOMP's potential for scalable, portable, and efficient heterogeneous computing.
Cross-Feature Transfer Learning for Efficient Tensor Program Generation
Applied Sciences · 2024-01-06 · 6 citations
articleOpen accessSenior authorTuning tensor program generation involves navigating a vast search space to find optimal program transformations and measurements for a program on the target hardware. The complexity of this process is further amplified by the exponential combinations of transformations, especially in heterogeneous environments. This research addresses these challenges by introducing a novel approach that learns the joint neural network and hardware features space, facilitating knowledge transfer to new, unseen target hardware. A comprehensive analysis is conducted on the existing state-of-the-art dataset, TenSet, including a thorough examination of test split strategies and the proposal of methodologies for dataset pruning. Leveraging an attention-inspired technique, we tailor the tuning of tensor programs to embed both neural network and hardware-specific features. Notably, our approach substantially reduces the dataset size by up to 53% compared to the baseline without compromising Pairwise Comparison Accuracy (PCA). Furthermore, our proposed methodology demonstrates competitive or improved mean inference times with only 25–40% of the baseline tuning time across various networks and target hardware. The attention-based tuner can effectively utilize schedules learned from previous hardware program measurements to optimize tensor program tuning on previously unseen hardware, achieving a top-5 accuracy exceeding 90%. This research introduces a significant advancement in autotuning tensor program generation, addressing the complexities associated with heterogeneous environments and showcasing promising results regarding efficiency and accuracy.
How to Build a Quantum Supercomputer: Scaling from Hundreds to Millions of Qubits
ArXiv.org · 2024-11-15 · 5 citations
preprintOpen accessIn the span of four decades, quantum computation has evolved from an intellectual curiosity to a potentially realizable technology. Today, small-scale demonstrations have become possible for quantum algorithmic primitives on hundreds of physical qubits. Nevertheless, there are significant outstanding challenges in quantum hardware, fabrication, software architecture, and algorithms on the path towards a full-stack scalable quantum computing technology. Here, we provide a comprehensive review of these scaling challenges. We show how to facilitate scaling by adopting existing semiconductor technology to build much higher-quality qubits, employing systems engineering approaches, and performing distributed heterogeneous quantum-classical computing. We provide a detailed resource and sensitivity analysis for quantum applications on surface-code error-corrected quantum computers given current, target, and desired hardware specifications based on superconducting qubits, accounting for a realistic distribution of errors. We provide comprehensive resource estimates for several utility-scale applications including quantum chemistry calculations, catalyst design, NMR spectroscopy, and Fermi-Hubbard simulation. We show that orders of magnitude enhancement in performance could be obtained by a combination of hardware improvements and tight quantum-HPC integration. Furthermore, we introduce high-performance architectures for quantum-probabilistic computing with custom-designed accelerators to tackle today's industry-scale classical optimization, machine learning, and quantum simulation tasks in a cost-effective manner.
Cross-Feature Transfer Learning For Efficient Tensor Program Generation
Preprints.org · 2024-01-03
preprintOpen accessSenior authorTuning tensor program generation involves navigating a vast search space to find optimal program transformations and measurements for a program on target hardware. The complexity of this process is further amplified by the exponential combinations of transformations, especially in heterogeneous environments. This research addresses these challenges by introducing a novel approach that learns the joint neural network and hardware features space, facilitating knowledge transfer to new, unseen target hardware. A comprehensive analysis is conducted on the existing state-of-the-art dataset, TenSet, including a thorough examination of test split strategies and the proposal of methodologies for dataset pruning. Leveraging an attention-inspired technique, we tailor the tuning of tensor programs to embed both neural network and hardware-specific features. Notably, our approach substantially reduces the dataset size by up to 53% compared to the baseline without compromising Pairwise Comparison Accuracy (PCA). Furthermore, our proposed methodology demonstrates competitive or improved mean inference times with only 25%40% of the baseline tuning time across various networks and target hardware. The attention-based tuner can effectively utilize schedules learned from previous hardware program measurements to optimize tensor program tuning on previously unseen hardware, achieving a top-5 accuracy exceeding 90%. This research introduces a significant advancement in auto-tuning tensor program generation, addressing the complexities associated with heterogeneous environments and showcasing promising results regarding efficiency and accuracy.
Lecture notes in computer science · 2024-01-01 · 2 citations
book-chapterSenior authorEvaluating Tuning Opportunities of the LLVM/OpenMP Runtime
2024-11-17 · 1 citations
articleTuning parallel applications on multi-core architectures is an arduous task. Several studies have utilized auto-tuning for OpenMP applications via standardized user-facing features, namely number of threads, thread placement, binding and scheduling policy. However, they fall short on utilizing the additional parameters provided by an OpenMP implementation. In this paper, we analyze OpenMP application runtime through an exhaustive exploration of all relevant configuration options of the LLVM/OpenMP runtime.Our findings allow to identify trends in tuning potential, architecture-aware tuning suggestions, and good default configurations per architecture. We will open-source the 240,000 unique samples collected during experiments for use by the community. These runs have been conducted on three different CPU architectures vital in the HPC and datacenter community. Choice of applications includes popular benchmark suites and microbench-marks namely, NAS Parallel Benchmarks, Barcelona OpenMP Task Suite, XSBench, RSBench, SU3Bench and LULESH.We employ the Linear Models class of Machine Learning algorithms to perform analysis, explain, and form qualitative relations between features comprising of the underlying architecture, application, input size, number of threads, and considered environment variables. This is further used to recommend different configurations given an application type/architecture.
Recent grants
SPX: Collaborative Research: Cross-layer Application-Aware Resilience at Extreme Scale (CAARES)
NSF · $267k · 2017–2021
Collaborative Research: Extreme OpenMP: A Programming Model for Productive High End Computing
NSF · $629k · 2008–2012
Collaborative Research: Performance Toolset for Dynamic Optimization of High-End Hybrid Applications
NSF · $188k · 2004–2008
Scalable Performance and Power-Aware Hybrid Compilation System for Multicores
NSF · $381k · 2007–2011
Frequent coauthors
- 51 shared
Óscar Hernández
- 49 shared
Brooke M. Smith
- 49 shared
William G. Rixey
- 49 shared
Mark N. Goltz
Wright-Patterson Air Force Base
- 49 shared
Terry Morse
Shell (Japan)
- 49 shared
Ronald Anderson
University of Pretoria
- 49 shared
M. Eileen Dolan
- 49 shared
Adrian D. Fure
Labs
Research Opportunities in Computer SciencePI
Education
- 1998
Ph.D., Computer Science
Queen's University Belfast
- 1998
Ph.D., Computer Science
Queen's University Belfast
- Resume-aware match score
- Save to shortlist
- AI-drafted outreach
See your match with Barbara Chapman
PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.
- Free to start
- No credit card
- 30-second signup