Peter A. Beerel

· Professor of Electrical and Computer EngineeringVerified

University of Southern California · Ming Hsieh Department of Electrical and Computer Engineering

Active 1991–2026

h-index33

Citations4.5k

Papers404163 last 5y

Funding$700k

Faculty page Lab page Website

See your match with Peter A. Beerel — sign in to PhdFit.Sign in

About

Professor Peter A. Beerel leads the Energy Efficient Secure Sustainable Computing (E2S2C) group at the University of Southern California. His research spans circuits, micro-architecture, and algorithms with a focus on emerging areas in energy-efficient, secure, and sustainable computing. The group operates under principles of academic curiosity, integrity, and collaboration to address real-world problems through the mathematical foundations of Electrical and Computer Engineering. Current research projects under his leadership include machine-learning algorithm hardware co-design, superconducting electronics, hardware security, and asynchronous VLSI design. Additionally, the group engages in multidisciplinary collaborations, including efforts to mitigate wildfires. Professor Beerel's group actively seeks postdoctoral researchers, PhD candidates, and strong master's students to contribute to directed research in these areas.

Research topics

Computer Science
Engineering
Artificial Intelligence
Electrical engineering

Selected publications

MemRoPE: Training-Free Infinite Video Generation via Evolving Memory Tokens
arXiv (Cornell University) · 2026-03-12
preprintOpen accessSenior author
Autoregressive diffusion enables real-time frame streaming, yet existing sliding-window caches discard past context, causing fidelity degradation, identity drift, and motion stagnation over long horizons. Current approaches preserve a fixed set of early tokens as attention sinks, but this static anchor cannot reflect the evolving content of a growing video. We introduce MemRoPE, a training-free framework with two co-designed components. Memory Tokens continuously compress all past keys into dual long-term and short-term streams via exponential moving averages, maintaining both global identity and recent dynamics within a fixed-size cache. Online RoPE Indexing caches unrotated keys and applies positional embeddings dynamically at attention time, ensuring the aggregation is free of conflicting positional phases. These two mechanisms are mutually enabling: positional decoupling makes temporal aggregation well-defined, while aggregation makes fixed-size caching viable for unbounded generation. Extensive experiments validate that MemRoPE outperforms existing methods in temporal coherence, visual fidelity, and subject consistency across minute- to hour-scale generation.
Publisher DOI
Generalizable Verilog Modeling Framework for Synchronous and Asynchronous Superconducting Pulse-Based Logic Gates
ArXiv.org · 2026-03-26
articleOpen accessSenior author
Superconducting Single Flux Quantum (SFQ) logic offers a promising platform for ultra-low-power, high-frequency computing. However, their pulse-based nature poses challenges for scalable modeling, design, and verification using conventional hardware description languages (HDLs), which are designed for level-based digital logic. Prior efforts have required complex Verilog support modules to enable Standard Delay Format (SDF) compatibility and have provided limited coverage of SFQ cell types. This work presents a Verilog-based modeling framework for SFQ gates that enables functional and timing verification while maintaining compatibility with Standard Delay Format (SDF) back annotation and is the first framework to support both synchronous and asynchronous SFQ gates. The proposed models are validated through device-level simulations, demonstrating correct functionality and timing constraint coverage. RTL simulation of mixed synchronous-asynchronous circuits further demonstrate the utility of the proposed framework.
Publisher OA PDF
qPRO-AQFP: Post-Routing Optimization of AQFP Circuits with Delay Line Clocking
arXiv (Cornell University) · 2026-04-09
preprintOpen accessSenior author
Adiabatic Quantum-Flux-Parametron (AQFP) logic is an ultra-low-power superconducting logic family with energy consumption approaching the Shannon limit, making it attractive for quantum computing control and cryogenic computing systems. Traditional AQFP designs face significant physical design challenges due to strict gate-level clocking requirements and limited interconnect lengths, leading to substantial buffer overhead and difficult timing closure. Recently, delay-line clocking of AQFP has been proposed to improve timing margins and reduce latency by enabling more flexible clock scheduling. However, prior work has primarily focused on placement and latency minimization, while relying on fixed timing parameters that do not capture the frequency dependence of AQFP setup and hold constraints. To address this limitation, we propose a frequency-aware post-routing optimization framework that jointly optimizes clock period, latency, and timing slack under user-specified weighting. Experimental results across common benchmarks achieve 100% post-routing timing closure across a range of performance--latency--slack trade-offs. Our approach also automates phase-skipping, reducing path-balancing buffer insertion by 34% on average while only reducing operating frequency by 4%.
Publisher DOI
Generalizable Verilog Modeling Framework for Synchronous and Asynchronous Superconducting Pulse-Based Logic Gates
arXiv (Cornell University) · 2026-03-26
preprintOpen accessSenior author
Superconducting Single Flux Quantum (SFQ) logic offers a promising platform for ultra-low-power, high-frequency computing. However, their pulse-based nature poses challenges for scalable modeling, design, and verification using conventional hardware description languages (HDLs), which are designed for level-based digital logic. Prior efforts have required complex Verilog support modules to enable Standard Delay Format (SDF) compatibility and have provided limited coverage of SFQ cell types. This work presents a Verilog-based modeling framework for SFQ gates that enables functional and timing verification while maintaining compatibility with Standard Delay Format (SDF) back annotation and is the first framework to support both synchronous and asynchronous SFQ gates. The proposed models are validated through device-level simulations, demonstrating correct functionality and timing constraint coverage. RTL simulation of mixed synchronous-asynchronous circuits further demonstrate the utility of the proposed framework.
Publisher DOI
Branch Landing: Bloom Filter-Based Source Authorization for Forward-Edge CFI on RISC-V
arXiv (Cornell University) · 2026-04-25
preprintOpen accessSenior author
Jump-Oriented Programming (JOP) attacks exploit indirect control transfers to bypass backward-edge defenses, yet existing forward-edge CFI mechanisms lack precise source-domain authorization: type-based CFI admits all same-signature callers, while tag-based hardware CFI is limited by fixed-width register storage that caps the number of simultaneously authorized sources. We propose Branch Landing (BRL), a landing-based forward-edge CFI framework for RISC-V that replaces fixed-capacity checks with Bloom filter membership queries. Two lightweight ISA extensions, bld and brl, propagate a source Section Identifier (SID) through a dedicated BRState register and validate it at each landing site with fixed-probe latency that is independent of the number of authorized sources under a chosen filter configuration. Section granularity is configurable, supporting policies from type-based to CFG-derived authorization within a single mechanism. We implement Branch Landing in the LLVM RISC-V backend and evaluate it on 81 BEEBS benchmarks under two representative policy configurations: a function-level, type-based policy and a basic-block-level, CFG-derived policy. Under a 3-cycle brl latency model, the two configurations incur average runtime overheads of only 0.210% and 0.421%, with mean code size growth of 0.46% and 0.52% respectively. The CFG-derived policy reduces the average equivalence class size by 32.5% compared to the type-based policy, and all evaluated executions complete without BRL enforcement failures.
Publisher DOI
Optimizing Phase-Scheduling with Throughput Trade-offs in AQFP Digital Circuits
IEEE Transactions on Applied Superconductivity · 2026-01-01
articleOpen accessSenior author
Adiabatic Quantum-Flux-Parametron (AQFP) logic is a promising emerging superconducting technology for ultralow power digital circuits, offering orders of magnitude lower power consumption than CMOS. However, AQFP scalability is challenged by excessive buffer overhead due to path balancing technology constraints. Addressing this, recent AQFP works have proposed design solutions to reduce path balancing overhead using phase-skipping and phase-alignment. Phase-skipping is a circuit-level technique that allows data transfer between AQFP gates clocked with non-consecutive clock phases. In contrast, phase-alignment is an architectural approach involving repeating input patterns to allow data transfer between AQFP gates across multiples of full clock cycles. While both techniques individually mitigate the area overhead of path-balancing, they have not yet been jointly explored. In this work, we present the first clock phase scheduling algorithm that combines phase-skipping and phase-alignment. We first present a minimum area method that on average, achieves a 25% area reduction compared to phase-skipping alone and a 11% reduction compared to phase alignment. We then extend the method to enforce a target throughput, enabling efficient area-performance trade-offs. With our throughput constrained optimization, we achieve on average 6.8% area savings with a 2.62x increased throughput compared to the state-of-the-art phase-aligned method.
Publisher OA PDF DOI
MemRoPE: Training-Free Infinite Video Generation via Evolving Memory Tokens
arXiv (Cornell University) · 2026-03-12
articleOpen accessSenior author
Autoregressive diffusion enables real-time frame streaming, yet existing sliding-window caches discard past context, causing fidelity degradation, identity drift, and motion stagnation over long horizons. Current approaches preserve a fixed set of early tokens as attention sinks, but this static anchor cannot reflect the evolving content of a growing video. We introduce MemRoPE, a training-free framework with two co-designed components. Memory Tokens continuously compress all past keys into dual long-term and short-term streams via exponential moving averages, maintaining both global identity and recent dynamics within a fixed-size cache. Online RoPE Indexing caches unrotated keys and applies positional embeddings dynamically at attention time, ensuring the aggregation is free of conflicting positional phases. These two mechanisms are mutually enabling: positional decoupling makes temporal aggregation well-defined, while aggregation makes fixed-size caching viable for unbounded generation. Extensive experiments validate that MemRoPE outperforms existing methods in temporal coherence, visual fidelity, and subject consistency across minute- to hour-scale generation.
Publisher OA PDF
Breaking TinyML: Why Quantized Neural Networks Need Domain-Specific Security Analysis
IEEE Micro · 2026-01-01
article
Most TinyML hardware focus on supporting Quantized Neural Networks (QNNs) to meet stringent constraints on power consumption, size, and cost. Despite this, the security aspects of quantization within TinyML hardware remain largely unexplored. Although previous studies indicate that QNNs demonstrate similar or enhanced robustness when compared to full-precision Deep Neural Networks (DNNs) against typical evasion attacks, no attack strategies tailored specifically for TinyML hardware have been proposed yet. This paper addresses the aforementioned shortfall by demonstrating how a two-step attack pipeline can surpass the current state-of-the-art in the QNN context and shows the need for more hardware-aware security research.
Publisher DOI
Branch Landing: Bloom Filter-Based Source Authorization for Forward-Edge CFI on RISC-V
ArXiv.org · 2026-04-25
articleOpen accessSenior author
Jump-Oriented Programming (JOP) attacks exploit indirect control transfers to bypass backward-edge defenses, yet existing forward-edge CFI mechanisms lack precise source-domain authorization: type-based CFI admits all same-signature callers, while tag-based hardware CFI is limited by fixed-width register storage that caps the number of simultaneously authorized sources. We propose Branch Landing (BRL), a landing-based forward-edge CFI framework for RISC-V that replaces fixed-capacity checks with Bloom filter membership queries. Two lightweight ISA extensions, bld and brl, propagate a source Section Identifier (SID) through a dedicated BRState register and validate it at each landing site with fixed-probe latency that is independent of the number of authorized sources under a chosen filter configuration. Section granularity is configurable, supporting policies from type-based to CFG-derived authorization within a single mechanism. We implement Branch Landing in the LLVM RISC-V backend and evaluate it on 81 BEEBS benchmarks under two representative policy configurations: a function-level, type-based policy and a basic-block-level, CFG-derived policy. Under a 3-cycle brl latency model, the two configurations incur average runtime overheads of only 0.210% and 0.421%, with mean code size growth of 0.46% and 0.52% respectively. The CFG-derived policy reduces the average equivalence class size by 32.5% compared to the type-based policy, and all evaluated executions complete without BRL enforcement failures.
Publisher OA PDF
NeVStereo: A NeRF-Driven NVS-Stereo Architecture for High-Fidelity 3D Tasks
Open MIND · 2026-02-05
preprint
In modern dense 3D reconstruction, feed-forward systems (e.g., VGGT, pi3) focus on end-to-end matching and geometry prediction but do not explicitly output the novel view synthesis (NVS). Neural rendering-based approaches offer high-fidelity NVS and detailed geometry from posed images, yet they typically assume fixed camera poses and can be sensitive to pose errors. As a result, it remains non-trivial to obtain a single framework that can offer accurate poses, reliable depth, high-quality rendering, and accurate 3D surfaces from casually captured views. We present NeVStereo, a NeRF-driven NVS-stereo architecture that aims to jointly deliver camera poses, multi-view depth, novel view synthesis, and surface reconstruction from multi-view RGB-only inputs. NeVStereo combines NeRF-based NVS for stereo-friendly renderings, confidence-guided multi-view depth estimation, NeRF-coupled bundle adjustment for pose refinement, and an iterative refinement stage that updates both depth and the radiance field to improve geometric consistency. This design mitigated the common NeRF-based issues such as surface stacking, artifacts, and pose-depth coupling. Across indoor, outdoor, tabletop, and aerial benchmarks, our experiments indicate that NeVStereo achieves consistently strong zero-shot performance, with up to 36% lower depth error, 10.4% improved pose accuracy, 4.5% higher NVS fidelity, and state-of-the-art mesh quality (F1 91.93%, Chamfer 4.35 mm) compared to existing prestigious methods.
DOI

Recent grants

SHF: Small: Methodology, Tools, and Circuits for Bundled-Data Resilient Asynchronous Design
NSF · $400k · 2016–2020
SHF: Small: Reconditioning: Optimizing Conditional Communication in Asynchronous Design
NSF · $300k · 2011–2015

Frequent coauthors

Gourav Datta
70 shared
Recep O. Ozdag
48 shared
Souvik Kundu
41 shared
Marcos Ferretti
40 shared
K.M. Chugg
35 shared
Souvik Kundu
Intel (United States)
35 shared
Massoud Pedram
University of Southern California
35 shared
Ellen Randall
Institute of Electrical and Electronics Engineers
32 shared

Labs

USC + Amazon Center on Secure & Trusted MLPI

Education

Ph.D., Electrical Engineering
University of Southern California
1990
M.S., Electrical Engineering
University of Southern California
1986
B.S., Electrical Engineering
University of Southern California
1984

Awards & honors

VSoE Outstanding Teaching Award (1997)
VSoE Junior Research Award (1998)
NSF CAREER Award (1995)
Zumberge Fellowship (1995)
IEEE Region 6 Outstanding Engineer Award (2008)

Resume-aware match score
Save to shortlist
AI-drafted outreach

See your match with Peter A. Beerel

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

Join the waitlist How it works

Free to start
No credit card
30-second signup

Find professors who actually fit you