
Peter A. Beerel
· Professor of Electrical and Computer EngineeringVerifiedUniversity of Southern California · Ming Hsieh Department of Electrical and Computer Engineering
Active 1991–2026
About
Professor Peter A. Beerel leads the Energy Efficient Secure Sustainable Computing (E2S2C) group at the University of Southern California. His research spans circuits, micro-architecture, and algorithms with a focus on emerging areas in energy-efficient, secure, and sustainable computing. The group operates under principles of academic curiosity, integrity, and collaboration to address real-world problems through the mathematical foundations of Electrical and Computer Engineering. Current research projects under his leadership include machine-learning algorithm hardware co-design, superconducting electronics, hardware security, and asynchronous VLSI design. Additionally, the group engages in multidisciplinary collaborations, including efforts to mitigate wildfires. Professor Beerel's group actively seeks postdoctoral researchers, PhD candidates, and strong master's students to contribute to directed research in these areas.
Research topics
- Computer Science
- Engineering
- Artificial Intelligence
- Electrical engineering
Selected publications
MemRoPE: Training-Free Infinite Video Generation via Evolving Memory Tokens
arXiv (Cornell University) · 2026-03-12
preprintOpen accessSenior authorAutoregressive diffusion enables real-time frame streaming, yet existing sliding-window caches discard past context, causing fidelity degradation, identity drift, and motion stagnation over long horizons. Current approaches preserve a fixed set of early tokens as attention sinks, but this static anchor cannot reflect the evolving content of a growing video. We introduce MemRoPE, a training-free framework with two co-designed components. Memory Tokens continuously compress all past keys into dual long-term and short-term streams via exponential moving averages, maintaining both global identity and recent dynamics within a fixed-size cache. Online RoPE Indexing caches unrotated keys and applies positional embeddings dynamically at attention time, ensuring the aggregation is free of conflicting positional phases. These two mechanisms are mutually enabling: positional decoupling makes temporal aggregation well-defined, while aggregation makes fixed-size caching viable for unbounded generation. Extensive experiments validate that MemRoPE outperforms existing methods in temporal coherence, visual fidelity, and subject consistency across minute- to hour-scale generation.
ArXiv.org · 2026-03-26
articleOpen accessSenior authorSuperconducting Single Flux Quantum (SFQ) logic offers a promising platform for ultra-low-power, high-frequency computing. However, their pulse-based nature poses challenges for scalable modeling, design, and verification using conventional hardware description languages (HDLs), which are designed for level-based digital logic. Prior efforts have required complex Verilog support modules to enable Standard Delay Format (SDF) compatibility and have provided limited coverage of SFQ cell types. This work presents a Verilog-based modeling framework for SFQ gates that enables functional and timing verification while maintaining compatibility with Standard Delay Format (SDF) back annotation and is the first framework to support both synchronous and asynchronous SFQ gates. The proposed models are validated through device-level simulations, demonstrating correct functionality and timing constraint coverage. RTL simulation of mixed synchronous-asynchronous circuits further demonstrate the utility of the proposed framework.
qPRO-AQFP: Post-Routing Optimization of AQFP Circuits with Delay Line Clocking
arXiv (Cornell University) · 2026-04-09
preprintOpen accessSenior authorAdiabatic Quantum-Flux-Parametron (AQFP) logic is an ultra-low-power superconducting logic family with energy consumption approaching the Shannon limit, making it attractive for quantum computing control and cryogenic computing systems. Traditional AQFP designs face significant physical design challenges due to strict gate-level clocking requirements and limited interconnect lengths, leading to substantial buffer overhead and difficult timing closure. Recently, delay-line clocking of AQFP has been proposed to improve timing margins and reduce latency by enabling more flexible clock scheduling. However, prior work has primarily focused on placement and latency minimization, while relying on fixed timing parameters that do not capture the frequency dependence of AQFP setup and hold constraints. To address this limitation, we propose a frequency-aware post-routing optimization framework that jointly optimizes clock period, latency, and timing slack under user-specified weighting. Experimental results across common benchmarks achieve 100% post-routing timing closure across a range of performance--latency--slack trade-offs. Our approach also automates phase-skipping, reducing path-balancing buffer insertion by 34% on average while only reducing operating frequency by 4%.
arXiv (Cornell University) · 2026-03-26
preprintOpen accessSenior authorSuperconducting Single Flux Quantum (SFQ) logic offers a promising platform for ultra-low-power, high-frequency computing. However, their pulse-based nature poses challenges for scalable modeling, design, and verification using conventional hardware description languages (HDLs), which are designed for level-based digital logic. Prior efforts have required complex Verilog support modules to enable Standard Delay Format (SDF) compatibility and have provided limited coverage of SFQ cell types. This work presents a Verilog-based modeling framework for SFQ gates that enables functional and timing verification while maintaining compatibility with Standard Delay Format (SDF) back annotation and is the first framework to support both synchronous and asynchronous SFQ gates. The proposed models are validated through device-level simulations, demonstrating correct functionality and timing constraint coverage. RTL simulation of mixed synchronous-asynchronous circuits further demonstrate the utility of the proposed framework.
Branch Landing: Bloom Filter-Based Source Authorization for Forward-Edge CFI on RISC-V
arXiv (Cornell University) · 2026-04-25
preprintOpen accessSenior authorJump-Oriented Programming (JOP) attacks exploit indirect control transfers to bypass backward-edge defenses, yet existing forward-edge CFI mechanisms lack precise source-domain authorization: type-based CFI admits all same-signature callers, while tag-based hardware CFI is limited by fixed-width register storage that caps the number of simultaneously authorized sources. We propose Branch Landing (BRL), a landing-based forward-edge CFI framework for RISC-V that replaces fixed-capacity checks with Bloom filter membership queries. Two lightweight ISA extensions, bld and brl, propagate a source Section Identifier (SID) through a dedicated BRState register and validate it at each landing site with fixed-probe latency that is independent of the number of authorized sources under a chosen filter configuration. Section granularity is configurable, supporting policies from type-based to CFG-derived authorization within a single mechanism. We implement Branch Landing in the LLVM RISC-V backend and evaluate it on 81 BEEBS benchmarks under two representative policy configurations: a function-level, type-based policy and a basic-block-level, CFG-derived policy. Under a 3-cycle brl latency model, the two configurations incur average runtime overheads of only 0.210% and 0.421%, with mean code size growth of 0.46% and 0.52% respectively. The CFG-derived policy reduces the average equivalence class size by 32.5% compared to the type-based policy, and all evaluated executions complete without BRL enforcement failures.
Optimizing Phase-Scheduling with Throughput Trade-offs in AQFP Digital Circuits
IEEE Transactions on Applied Superconductivity · 2026-01-01
articleOpen accessSenior authorAdiabatic Quantum-Flux-Parametron (AQFP) logic is a promising emerging superconducting technology for ultralow power digital circuits, offering orders of magnitude lower power consumption than CMOS. However, AQFP scalability is challenged by excessive buffer overhead due to path balancing technology constraints. Addressing this, recent AQFP works have proposed design solutions to reduce path balancing overhead using phase-skipping and phase-alignment. Phase-skipping is a circuit-level technique that allows data transfer between AQFP gates clocked with non-consecutive clock phases. In contrast, phase-alignment is an architectural approach involving repeating input patterns to allow data transfer between AQFP gates across multiples of full clock cycles. While both techniques individually mitigate the area overhead of path-balancing, they have not yet been jointly explored. In this work, we present the first clock phase scheduling algorithm that combines phase-skipping and phase-alignment. We first present a minimum area method that on average, achieves a 25% area reduction compared to phase-skipping alone and a 11% reduction compared to phase alignment. We then extend the method to enforce a target throughput, enabling efficient area-performance trade-offs. With our throughput constrained optimization, we achieve on average 6.8% area savings with a 2.62x increased throughput compared to the state-of-the-art phase-aligned method.
MemRoPE: Training-Free Infinite Video Generation via Evolving Memory Tokens
arXiv (Cornell University) · 2026-03-12
articleOpen accessSenior authorAutoregressive diffusion enables real-time frame streaming, yet existing sliding-window caches discard past context, causing fidelity degradation, identity drift, and motion stagnation over long horizons. Current approaches preserve a fixed set of early tokens as attention sinks, but this static anchor cannot reflect the evolving content of a growing video. We introduce MemRoPE, a training-free framework with two co-designed components. Memory Tokens continuously compress all past keys into dual long-term and short-term streams via exponential moving averages, maintaining both global identity and recent dynamics within a fixed-size cache. Online RoPE Indexing caches unrotated keys and applies positional embeddings dynamically at attention time, ensuring the aggregation is free of conflicting positional phases. These two mechanisms are mutually enabling: positional decoupling makes temporal aggregation well-defined, while aggregation makes fixed-size caching viable for unbounded generation. Extensive experiments validate that MemRoPE outperforms existing methods in temporal coherence, visual fidelity, and subject consistency across minute- to hour-scale generation.
Breaking TinyML: Why Quantized Neural Networks Need Domain-Specific Security Analysis
IEEE Micro · 2026-01-01
articleMost TinyML hardware focus on supporting Quantized Neural Networks (QNNs) to meet stringent constraints on power consumption, size, and cost. Despite this, the security aspects of quantization within TinyML hardware remain largely unexplored. Although previous studies indicate that QNNs demonstrate similar or enhanced robustness when compared to full-precision Deep Neural Networks (DNNs) against typical evasion attacks, no attack strategies tailored specifically for TinyML hardware have been proposed yet. This paper addresses the aforementioned shortfall by demonstrating how a two-step attack pipeline can surpass the current state-of-the-art in the QNN context and shows the need for more hardware-aware security research.
Branch Landing: Bloom Filter-Based Source Authorization for Forward-Edge CFI on RISC-V
ArXiv.org · 2026-04-25
articleOpen accessSenior authorJump-Oriented Programming (JOP) attacks exploit indirect control transfers to bypass backward-edge defenses, yet existing forward-edge CFI mechanisms lack precise source-domain authorization: type-based CFI admits all same-signature callers, while tag-based hardware CFI is limited by fixed-width register storage that caps the number of simultaneously authorized sources. We propose Branch Landing (BRL), a landing-based forward-edge CFI framework for RISC-V that replaces fixed-capacity checks with Bloom filter membership queries. Two lightweight ISA extensions, bld and brl, propagate a source Section Identifier (SID) through a dedicated BRState register and validate it at each landing site with fixed-probe latency that is independent of the number of authorized sources under a chosen filter configuration. Section granularity is configurable, supporting policies from type-based to CFG-derived authorization within a single mechanism. We implement Branch Landing in the LLVM RISC-V backend and evaluate it on 81 BEEBS benchmarks under two representative policy configurations: a function-level, type-based policy and a basic-block-level, CFG-derived policy. Under a 3-cycle brl latency model, the two configurations incur average runtime overheads of only 0.210% and 0.421%, with mean code size growth of 0.46% and 0.52% respectively. The CFG-derived policy reduces the average equivalence class size by 32.5% compared to the type-based policy, and all evaluated executions complete without BRL enforcement failures.
NeVStereo: A NeRF-Driven NVS-Stereo Architecture for High-Fidelity 3D Tasks
Open MIND · 2026-02-05
preprintIn modern dense 3D reconstruction, feed-forward systems (e.g., VGGT, pi3) focus on end-to-end matching and geometry prediction but do not explicitly output the novel view synthesis (NVS). Neural rendering-based approaches offer high-fidelity NVS and detailed geometry from posed images, yet they typically assume fixed camera poses and can be sensitive to pose errors. As a result, it remains non-trivial to obtain a single framework that can offer accurate poses, reliable depth, high-quality rendering, and accurate 3D surfaces from casually captured views. We present NeVStereo, a NeRF-driven NVS-stereo architecture that aims to jointly deliver camera poses, multi-view depth, novel view synthesis, and surface reconstruction from multi-view RGB-only inputs. NeVStereo combines NeRF-based NVS for stereo-friendly renderings, confidence-guided multi-view depth estimation, NeRF-coupled bundle adjustment for pose refinement, and an iterative refinement stage that updates both depth and the radiance field to improve geometric consistency. This design mitigated the common NeRF-based issues such as surface stacking, artifacts, and pose-depth coupling. Across indoor, outdoor, tabletop, and aerial benchmarks, our experiments indicate that NeVStereo achieves consistently strong zero-shot performance, with up to 36% lower depth error, 10.4% improved pose accuracy, 4.5% higher NVS fidelity, and state-of-the-art mesh quality (F1 91.93%, Chamfer 4.35 mm) compared to existing prestigious methods.
Recent grants
SHF: Small: Methodology, Tools, and Circuits for Bundled-Data Resilient Asynchronous Design
NSF · $400k · 2016–2020
SHF: Small: Reconditioning: Optimizing Conditional Communication in Asynchronous Design
NSF · $300k · 2011–2015
Frequent coauthors
- 70 shared
Gourav Datta
- 48 shared
Recep O. Ozdag
- 41 shared
Souvik Kundu
- 40 shared
Marcos Ferretti
- 35 shared
K.M. Chugg
- 35 shared
Souvik Kundu
Intel (United States)
- 35 shared
Massoud Pedram
University of Southern California
- 32 shared
Ellen Randall
Institute of Electrical and Electronics Engineers
Labs
Education
- 1990
Ph.D., Electrical Engineering
University of Southern California
- 1986
M.S., Electrical Engineering
University of Southern California
- 1984
B.S., Electrical Engineering
University of Southern California
Awards & honors
- VSoE Outstanding Teaching Award (1997)
- VSoE Junior Research Award (1998)
- NSF CAREER Award (1995)
- Zumberge Fellowship (1995)
- IEEE Region 6 Outstanding Engineer Award (2008)
- Resume-aware match score
- Save to shortlist
- AI-drafted outreach
See your match with Peter A. Beerel
PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.
- Free to start
- No credit card
- 30-second signup