Resume-aware faculty matching

Find professors who actually fit you

Upload your resume. Four AI agents analyze your background, rank the faculty who fit, inspect their recent research, and help you draft outreach — grounded in their actual work, not templates.

Free to startNo credit cardCancel anytime
Top matches Balanced preset
Dr. Sarah Chen
Stanford · Interpretability · NLP
91
Dr. Marcus Holloway
MIT · Robotics · RL
84
Dr. Aisha Okonkwo
CMU · Fairness · HCI
82
Nova · Professor Researcher · re-ranking top 20…
Barbara Chapman

Barbara Chapman

· ProfessorVerified

Stony Brook University · Computer Science

Active 1990–2026

h-index33
Citations7.3k
Papers38761 last 5y
Funding$1.5M
See your match with Barbara Chapman — sign in to PhdFit.Sign in

About

Dr. Barbara Chapman is a Professor of Applied Mathematics and Statistics, and of Computer Science at Stony Brook University, where she is affiliated with the Institute for Advanced Computational Science. She also directs Computer Science and Mathematics Research at Brookhaven National Laboratory. Her research has focused on parallel programming interfaces and implementation technology for over 20 years, with significant efforts in developing community standards for parallel programming, including OpenMP, OpenACC, and OpenSHMEM. Her research group created the state-of-the-art OpenUH compiler, enabling practical experimentation with parallel language extensions and implementation techniques, as well as a reference implementation of the library-based OpenSHMEM programming interface. Dr. Chapman has co-authored over 200 papers and two books. Her current research primarily concentrates on OpenMP, an industry standard for shared memory parallel programming that has been broadly accepted by the computing community.

Research topics

  • Computer Science
  • Computer architecture
  • Operating system
  • Programming language
  • Parallel computing
  • Machine Learning
  • Artificial Intelligence
  • Computer engineering
  • Software engineering

Selected publications

  • An OGSI-compliant portal for campus grids

    2026-02-11

    article1st authorCorresponding

    Campus grids are virtual organizations consisting of resources owned by various labs and departments on a university campus that may be collectively put to use. Several research institutions including our own are early adopters of this technology. We describe the structure of our campus grid and introduce the EZ-Grid software that has been developed to enable high-level access to its services. EZ-Grid is currently undergoing revision to ensure that it will comply with emerging standards for grid services. We outline these new standards, and explain how we are working to meet them.

  • A Full Stack Framework for High Performance Quantum-Classical Computing

    ArXiv.org · 2025-10-23

    preprintOpen access

    To address the growing needs for scalable High Performance Computing (HPC) and Quantum Computing (QC) integration, we present our HPC-QC full stack framework and its hybrid workload development capability with modular hardware/device-agnostic software integration approach. The latest development in extensible interfaces for quantum programming, dispatching, and compilation within existing mature HPC programming environment are demonstrated. Our HPC-QC full stack enables high-level, portable invocation of quantum kernels from commercial quantum SDKs within HPC meta-program in compiled languages (C/C++ and Fortran) as well as Python through a quantum programming interface library extension. An adaptive circuit knitting hypervisor is being developed to partition large quantum circuits into sub-circuits that fit on smaller noisy quantum devices and classical simulators. At the lower-level, we leverage Cray LLVM-based compilation framework to transform and consume LLVM IR and Quantum IR (QIR) from commercial quantum software frontends in a retargetable fashion to different hardware architectures. Several hybrid HPC-QC multi-node multi-CPU and GPU workloads (including solving linear system of equations, quantum optimization, and simulating quantum phase transitions) have been demonstrated on HPE EX supercomputers to illustrate functionality and execution viability for all three components developed so far. This work provides the framework for a unified quantum-classical programming environment built upon classical HPC software stack (compilers, libraries, parallel runtime and process scheduling).

  • DiOMP-Offloading: Toward Portable Distributed Heterogeneous OpenMP

    2025-11-07 · 1 citations

    articleSenior author

    High-performance computing faces rising core counts, increasing heterogeneity, and growing memory bandwidth. These trends complicate programmability, portability, and scalability, while traditional MPI + OpenMP struggles with distributed GPU memory and portable performance. We present DiOMP-Offloading, a framework unifying OpenMP target offloading with a Partitioned Global Address Space (PGAS) model. Built on LLVM-OpenMP and GASNet-EX, it centrally manages global memory and supports symmetric/asymmetric GPU allocations, enabling remote put/get operations. DiOMP also integrates OMPCCL, a portable device-side collective layer that harmonizes allocation lifecycles and address translation across vendor backends. By eliminating separate MPI + X stacks and abstracting replicated device memory and communication logic, DiOMP improves scalability and programmability. Experiments on large-scale NVIDIA A100, Grace Hopper, and AMD MI250X platforms show superior micro-benchmark and application performance, demonstrating that DiOMP-Offloading offers a more portable, scalable, and efficient path for heterogeneous supercomputing.

  • Discussion of Device-Device Collective Communication in OpenMP Target Offloading

    Lecture notes in computer science · 2025-09-28

    book-chapterSenior author
  • DiOMP-Offloading: Toward Portable Distributed Heterogeneous OpenMP

    arXiv (Cornell University) · 2025-06-03

    preprintOpen accessSenior author

    As core counts and heterogeneity rise in HPC, traditional hybrid programming models face challenges in managing distributed GPU memory and ensuring portability. This paper presents DiOMP, a distributed OpenMP framework that unifies OpenMP target offloading with the Partitioned Global Address Space (PGAS) model. Built atop LLVM/OpenMP and using GASNet-EX or GPI-2 for communication, DiOMP transparently handles global memory, supporting both symmetric and asymmetric GPU allocations. It leverages OMPCCL, a portable collective communication layer compatible with vendor libraries. DiOMP simplifies programming by abstracting device memory and communication, achieving superior scalability and programmability over traditional approaches. Evaluations on NVIDIA A100, Grace Hopper, and AMD MI250X show improved performance in micro-benchmarks and applications like matrix multiplication and Minimod, highlighting DiOMP's potential for scalable, portable, and efficient heterogeneous computing.

  • Cross-Feature Transfer Learning for Efficient Tensor Program Generation

    Applied Sciences · 2024-01-06 · 6 citations

    articleOpen accessSenior author

    Tuning tensor program generation involves navigating a vast search space to find optimal program transformations and measurements for a program on the target hardware. The complexity of this process is further amplified by the exponential combinations of transformations, especially in heterogeneous environments. This research addresses these challenges by introducing a novel approach that learns the joint neural network and hardware features space, facilitating knowledge transfer to new, unseen target hardware. A comprehensive analysis is conducted on the existing state-of-the-art dataset, TenSet, including a thorough examination of test split strategies and the proposal of methodologies for dataset pruning. Leveraging an attention-inspired technique, we tailor the tuning of tensor programs to embed both neural network and hardware-specific features. Notably, our approach substantially reduces the dataset size by up to 53% compared to the baseline without compromising Pairwise Comparison Accuracy (PCA). Furthermore, our proposed methodology demonstrates competitive or improved mean inference times with only 25–40% of the baseline tuning time across various networks and target hardware. The attention-based tuner can effectively utilize schedules learned from previous hardware program measurements to optimize tensor program tuning on previously unseen hardware, achieving a top-5 accuracy exceeding 90%. This research introduces a significant advancement in autotuning tensor program generation, addressing the complexities associated with heterogeneous environments and showcasing promising results regarding efficiency and accuracy.

  • How to Build a Quantum Supercomputer: Scaling from Hundreds to Millions of Qubits

    ArXiv.org · 2024-11-15 · 5 citations

    preprintOpen access

    In the span of four decades, quantum computation has evolved from an intellectual curiosity to a potentially realizable technology. Today, small-scale demonstrations have become possible for quantum algorithmic primitives on hundreds of physical qubits. Nevertheless, there are significant outstanding challenges in quantum hardware, fabrication, software architecture, and algorithms on the path towards a full-stack scalable quantum computing technology. Here, we provide a comprehensive review of these scaling challenges. We show how to facilitate scaling by adopting existing semiconductor technology to build much higher-quality qubits, employing systems engineering approaches, and performing distributed heterogeneous quantum-classical computing. We provide a detailed resource and sensitivity analysis for quantum applications on surface-code error-corrected quantum computers given current, target, and desired hardware specifications based on superconducting qubits, accounting for a realistic distribution of errors. We provide comprehensive resource estimates for several utility-scale applications including quantum chemistry calculations, catalyst design, NMR spectroscopy, and Fermi-Hubbard simulation. We show that orders of magnitude enhancement in performance could be obtained by a combination of hardware improvements and tight quantum-HPC integration. Furthermore, we introduce high-performance architectures for quantum-probabilistic computing with custom-designed accelerators to tackle today's industry-scale classical optimization, machine learning, and quantum simulation tasks in a cost-effective manner.

  • Cross-Feature Transfer Learning For Efficient Tensor Program Generation

    Preprints.org · 2024-01-03

    preprintOpen accessSenior author

    Tuning tensor program generation involves navigating a vast search space to find optimal program transformations and measurements for a program on target hardware. The complexity of this process is further amplified by the exponential combinations of transformations, especially in heterogeneous environments. This research addresses these challenges by introducing a novel approach that learns the joint neural network and hardware features space, facilitating knowledge transfer to new, unseen target hardware. A comprehensive analysis is conducted on the existing state-of-the-art dataset, TenSet, including a thorough examination of test split strategies and the proposal of methodologies for dataset pruning. Leveraging an attention-inspired technique, we tailor the tuning of tensor programs to embed both neural network and hardware-specific features. Notably, our approach substantially reduces the dataset size by up to 53% compared to the baseline without compromising Pairwise Comparison Accuracy (PCA). Furthermore, our proposed methodology demonstrates competitive or improved mean inference times with only 25%40% of the baseline tuning time across various networks and target hardware. The attention-based tuner can effectively utilize schedules learned from previous hardware program measurements to optimize tensor program tuning on previously unseen hardware, achieving a top-5 accuracy exceeding 90%. This research introduces a significant advancement in auto-tuning tensor program generation, addressing the complexities associated with heterogeneous environments and showcasing promising results regarding efficiency and accuracy.

  • Evaluation of Directive-Based Programming Models for Stencil Computation on Current GPGPU Architectures

    Lecture notes in computer science · 2024-01-01 · 2 citations

    book-chapterSenior author
  • Evaluating Tuning Opportunities of the LLVM/OpenMP Runtime

    2024-11-17 · 1 citations

    article

    Tuning parallel applications on multi-core architectures is an arduous task. Several studies have utilized auto-tuning for OpenMP applications via standardized user-facing features, namely number of threads, thread placement, binding and scheduling policy. However, they fall short on utilizing the additional parameters provided by an OpenMP implementation. In this paper, we analyze OpenMP application runtime through an exhaustive exploration of all relevant configuration options of the LLVM/OpenMP runtime.Our findings allow to identify trends in tuning potential, architecture-aware tuning suggestions, and good default configurations per architecture. We will open-source the 240,000 unique samples collected during experiments for use by the community. These runs have been conducted on three different CPU architectures vital in the HPC and datacenter community. Choice of applications includes popular benchmark suites and microbench-marks namely, NAS Parallel Benchmarks, Barcelona OpenMP Task Suite, XSBench, RSBench, SU3Bench and LULESH.We employ the Linear Models class of Machine Learning algorithms to perform analysis, explain, and form qualitative relations between features comprising of the underlying architecture, application, input size, number of threads, and considered environment variables. This is further used to recommend different configurations given an application type/architecture.

Recent grants

Frequent coauthors

  • Óscar Hernández

    51 shared
  • Brooke M. Smith

    49 shared
  • William G. Rixey

    49 shared
  • Mark N. Goltz

    Wright-Patterson Air Force Base

    49 shared
  • Terry Morse

    Shell (Japan)

    49 shared
  • Ronald Anderson

    University of Pretoria

    49 shared
  • M. Eileen Dolan

    49 shared
  • Adrian D. Fure

    49 shared

Labs

  • Research Opportunities in Computer SciencePI

Education

  • Ph.D., Computer Science

    Queen's University Belfast

    1998
  • Ph.D., Computer Science

    Queen's University Belfast

    1998
  • Resume-aware match score
  • Save to shortlist
  • AI-drafted outreach

See your match with Barbara Chapman

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

  • Free to start
  • No credit card
  • 30-second signup