Andre DeHon
· ProfessorVerifiedUniversity of Pennsylvania · Electrical Engineering
Active 1990–2026
About
Andre DeHon is the Oliver C. Boileau Jr. and Nan Eleze Boileau Professor of Electrical Engineering at the University of Pennsylvania, affiliated with Electrical and Systems Engineering and Computer and Information Science. He is the founding chair of Penn's Computer Engineering program and the founding director of the CyberSavvy Research Center, a nationwide security initiative sponsored by DARPA. Additionally, he serves as chair of the ACM/SIGDA Technical Committee on FPGAs and Reconfigurable Computing. Prof. DeHon is a leading researcher in computer engineering, with a strong foundation in engineering science for reconfigurable computing and computer security. His Implementation of Computation Lab explores how computations can be physically implemented through software and hardware co-design. His work focuses on designing adaptable, resilient, and efficient hardware architectures that can be dynamically reprogrammed to accommodate various computational tasks, alongside developing tools to support these architectures. His research emphasizes reconfigurable computing, FPGA architectures, interconnects, and aspects such as security, reliability, energy efficiency, and performance. Prof. DeHon's broad research interests address the question of how to physically implement computations, including physical substrates such as VLSI and molecular technologies, programmable media like FPGAs and multiprocessors, mapping through compilation and CAD, system abstractions and dynamic management including run-time systems and operating systems, and problem capture via programming languages. His academic history includes serving as an assistant professor of computer science at Caltech from 1999 to 2006, a postdoctoral researcher at UC Berkeley from 1996 to 1999, and earning his SB, SM, and PhD degrees from MIT in 1990, 1993, and 1996 respectively. Prof. DeHon's work has significantly contributed to the fields of reconfigurable computing and hardware security, establishing foundational principles and advancing the state of the art in FPGA-based computation and system design.
Research topics
- Computer Science
- Computer Security
- Telecommunications
- World Wide Web
- Operating system
- Real-time computing
- Computer network
Selected publications
HiLFS: FPGA-Orchestrated File System for High-Level Synthesis
2026-02-05
articleOpen accessField Programmable Gate Arrays (FPGAs) deliver high performance, and High-Level Synthesis (HLS) simplifies computation description. However, modern FPGA systems cannot directly exploit the convenience and advanced features of contemporary file systems to enable performant, secure, and robust access to high-speed storage devices such as SSDs. This limitation significantly impedes the adoption of FPGAs in data- and I/O-intensive applications, including large language models (LLMs). Existing HLS storage solutions typically either rely on host CPUs to manage file systems via the operating system stack or provide only low-level block access, both of which introduce considerable performance and programmability overheads. Host-mediated access to storage incurs additional latency due to multiple round-trips through the OS kernel on the host CPU, while block-level management on the FPGA side requires substantial engineering efforts that often require recreating file system functionality, such as raw block management, security, and robustness guarantees. These challenges substantially complicate FPGA development and create a 100× gap in scalability compared to GPUs for deploying modern, large-scale machine learning models. To close this gap, we propose HiLFS, the first file system and storage stack for HLS that manages storage entirely within the FPGA. HiLFS exposes a POSIX-like file interface to HLS kernels to ease programming and maintains an on-chip cache of recently accessed file metadata to accelerate file access. It also provides rich file system features, including data integrity, crash consistency, durability guarantees, and efficient concurrent access. As such, HiLFS enables high-performance, secure, and reliable storage management, completely eliminating the need for host intervention. We prototype HiLFS on an AMD/Xilinx Alveo U200 FPGA with a Solidigm DC-P4610 SSD. On Mixtral 8×7B, HiLFS outperforms Nvidia Titan RTX by 1.1/1.3× in performance and 3.0/3.5× in energy efficiency with/without GPUDirect Storage. To the best of our knowledge, this represents the largest-scale LLM deployment on an FPGA to date. Moreover, HiLFS delivers 1.5/1.8× average latency and bandwidth improvements over state-of-the-art commercial CPU-centric storage platforms with/without PCIe P2P, while incurring 13% bandwidth and latency overhead to state-of-the-art HLS low-level block storage works. Furthermore, HiLFS reduces the LoC by 1.5× and 5.3× compared to CPU-centric and block-level storage platforms, respectively.
Hardware Accelerated FPGA Divide-and-Conquer Page Placement in Milliseconds
2026-02-05
articleOpen accessSenior authorExcessive FPGA compilation times, often measured in hours, stifle rapid iterative development, design-space exploration, and runtime reconfiguration applications. Coarse-grain divide-and-conquer techniques, which break large applications into separately compiled pages, offer moderate speedups, potentially bringing compilation down to minutes, but leave significant fine-grain parallelism opportunities untapped. Systolic-array-based accelerators have previously offered orders of magnitude speedup for FPGA placement (a major bottleneck in compilation), by exploiting massive fine-grain parallelism, however poor scalability restricts them to small designs, and lack of support for modern heterogeneous netlists (CLBs, BRAMs, DSPs) prevents their use today. We introduce an enhanced, FPGA-based systolic placement accelerator, capable of placing divide-and-conquer page-sized netlists of CLBs, BRAMs, and DSPs onto VTR architectures, with 2-3 orders of magnitude speedup over VTR-9 running on a modern workstation-class processor. We demonstrate page-placement in milliseconds on realistic benchmarks, including HLS dataflow designs, run on an AMD Versal VCK190 implementation of our systolic placer, forging a path towards real-time, self-hosted FPGA compilation.
Programming FPGAs for economics: An introduction to electrical engineering economics
Quantitative Economics · 2025-01-01 · 1 citations
articleOpen accessWe show how to use field‐programmable gate arrays (FPGAs) and their associated high‐level synthesis ( HLS) compilers to solve heterogeneous agent models with incomplete markets and aggregate uncertainty (Krusell and Smith (1998)). We document that the acceleration delivered by one single FPGA is comparable to that provided by using 69 CPU cores in a conventional cluster. The time to solve 1200 versions of the model drops from 8 hours to 7 minutes, illustrating a great potential for structural estimation. We describe how to achieve multiple acceleration opportunities—pipeline, data‐level parallelism, and data precision—with minimal modification of the C/C++ code written for a traditional sequential processor, which we then deploy on FPGAs easily available at Amazon Web Services. We quantify the speedup and cost of these accelerations. Our paper is the first step toward a new field, electrical engineering economics, focused on designing computational accelerators for economics to tackle challenging quantitative models. Replication code is available on Github.
REFINE: Runtime Execution Feedback for INcremental Evolution on FPGA Designs
2024-04-01 · 1 citations
articleOpen accessSenior authorFPGA design optimization is challenging for developers for two main reasons. First, developers cannot easily identify a bottleneck of the design to know where to focus optimization effort to improve the application execution time. Second, slow, monolithic FPGA compilation makes evaluation of each design change costly. Together, these make FPGA development different and more challenging than traditional software development where software engineers are accustomed to using rich profiling tools to improve their designs through a series of quick, incremental refinements. To address these issues, we propose a fast bottleneck identification scheme using runtime feedback and separate FPGA compilation. Our scheme systematically identifies bottlenecks in streaming computations based on FIFO event counters extracted from hardware execution and guides developers to the operations that limit performance. We showcase our support for bottleneck identification with the fast, automatic design space exploration, iterating initial design points quickly with a separate, incremental compilation strategy. When the design reaches the point that latency cannot improve with the separate compilation approach, we migrate to the monolithic design flow that does not have the area overhead and communication bandwidth limit of separate compilation approach. Then, the remaining design space, if any, is explored with a monolithic flow. When tested on the AMD ZCU102 embedded platform with realistic HLS dataflow designs, our approach correctly identifies bottlenecks improving application latency 2.2-12.7× while reducing tuning time by 1.3-2.7× compared to monolithic flow.
Asymmetry in Butterfly Fat Tree FPGA NoC
2023-12-12 · 1 citations
articleSenior authorAmong various topologies for FPGA overlay Network-on-Chip (NoC), the Butterfly Fat Tree (BFT) is known to be fast and lightweight. The BFT has a hierarchical structure that allows the routing capacity of each level to be configured with bandwidth-reducing t switches and bandwidth-preserving $\pi$ switches, and this configuration can be exploited to customize the NoC resources, spending area as needed to match the bandwidth requirements of the application. However, a traditional BFT is symmetric: switch types in all subtrees in the same level are identical; this does not fully exploit the customization offered by the FPGA. We evaluate asymmetric BFTs that have different bandwidth in their subtrees, and we develop a converging switch built with t switches that connects subtrees with different bandwidths. Given the same resource budget, asymmetric BFTs perform better than symmetric BFTs when NoC traffic is highly unbalanced. In realistic workloads and statistical traffic patterns, asymmetric BFTs achieve up to 32% and 76% more throughput than symmetric BFTs, respectively.
ExHiPR: Extended High-Level Partial Reconfiguration for Fast Incremental FPGA Compilation
ACM Transactions on Reconfigurable Technology and Systems · 2023-09-14 · 5 citations
articleOpen accessSenior authorPartial Reconfiguration (PR) is a key technique in the application design on modern FPGAs. However, current PR tools heavily rely on the developer to manually conduct PR module definition, floorplanning, and flow control at a low level. The existing PR tools do not consider High-Level-Synthesis languages either, which are of great interest to software developers. We propose HiPR, an open-source framework, to bridge the gap between HLS and PR. HiPR allows the developer to define partially reconfigurable C/C++ functions, instead of Verilog modules, to accelerate the FPGA incremental compilation and automate the flow from C/C++ to bitstreams. We use a lightweight Simulated Annealing floorplanner and show that it can produce high-quality PR floorplans an order of magnitude faster than analytic methods. By mapping Rosetta HLS benchmarks, we demonstrate that the incremental compilation can be accelerated by 3–10× compared with state-of-the-art Xilinx Vitis flow without performance loss, at the cost of 15–67% one-time overlay set-up time.
HiPR: High-level Partial Reconfiguration for Fast Incremental FPGA Compilation
2022-08-01 · 9 citations
articleSenior authorPartial Reconfiguration (PR) is a key technique in the design of modern FPGAs. However, current PR tools heavily rely on the developers to manually conduct PR module definition, floorplanning, and flow control at a low level. The existing PR tools do not consider High-Level-Synthesis languages either, which is of great interest to software developers. We propose HiPR, an open-source framework, to bridge the gap between HLS and PR. HiPR allows the developer to define partially reconfigurable C/C++ functions instead of Verilog modules, which benefits the FPGA incremental compilation and automates the flow from C/C++ to bitstreams. By mapping Rosetta HLS benchmarks, the incremental compilation can be accelerated by 3–10× compared with Xilinx Vitis normal flow without performance loss.
Fast and Flexible FPGA Development using Hierarchical Partial Reconfiguration
2022-12-05 · 9 citations
articleSenior authorTo address slow FPGA compilation, researchers have proposed to run separate compilations for smaller design components in parallel. This approach provides small pages on the FPGA, allowing users to separately generate partial designs on the pages and load them together. However, this method either forces users to manually decompose a design into small components that fit in small, fixed-sized pages or to use large, fixed-sized pages, reducing the potential compilation speedup benefits. This restriction often results in suboptimal decomposition of a design or diminishes productivity. To overcome these limitations, we utilize the recently supported Hierarchical Partial Reconfiguration technology from Xilinx to generate a more flexible framework. Depending on the size of user designs, our framework provides larger pages that are hierarchically recombined from multiple smaller pages. This flexibility relieves users of the burden to decompose the original design and offers more opportunities for design-space exploration. When tested on the ZCU102 embedded platform with the Rosetta HLS benchmarks, our system achieves <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$1.4-4.9\times$</tex> mapped application performance improvement compared to the system with fixed-sized pages while still compiling in 2–5 minutes <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$(2.2-5.3\times$</tex> faster than the vendor tool).
Programming FPGAs for Economics: An Introduction to Electrical Engineering Economics
National Bureau of Economic Research · 2022-04-01 · 3 citations
reportOpen accessWe show how to use field-programmable gate arrays (FPGAs) and their associated high-level synthesis (HLS) compilers to solve heterogeneous agent models with incomplete markets and aggregate uncertainty (Krusell and Smith, 1998). We document that the acceleration delivered by one single FPGA is comparable to that provided by using 74 CPU cores in a conventional cluster. We describe how to achieve multiple acceleration opportunities—pipeline, data-level parallelism, and data precision—with minimal modification of the C code written for a traditional sequential processor, which we then deploy on FPGAs easily available at Amazon Web Services. We quantify the speedup and cost of these accelerations. Our paper is the first step toward a new field, electrical engineering economics, focused on designing computational accelerators for economics to tackle challenging quantitative models.
2022-02-22 · 18 citations
articleOpen accessSenior authorFPGA-based accelerators are demonstrating significant absolute performance and energy efficiency compared with general-purpose CPUs. While FPGA computations can now be described in standard, programming languages, like C, development for FPGAs accelerators remains tedious and inaccessible to modern software engineers. Slow compiles (potentially taking tens of hours) inhibit the rapid, incremental refinement of designs that is the hallmark of modern software engineering. To address this issue, we introduce separate compilation and linkage into the FPGA design flow, providing faster design turns more familiar to software development. To realize this flow, we provide abstractions, compiler options, and compiler flow that allow the same C source code to be compiled to processor cores in seconds and to FPGA regions in minutes, providing the missing -O0 and -O1 options familiar in software development. This raises the FPGA programming level and standardizes the programming experience, bringing FPGA-based accelerators into a more familiar software platform ecosystem for software engineers.
Recent grants
CAREER: Interconnect Design for Programmable Computation
NSF · $106k · 2007–2008
SHF: MEDIUM: Semiconductor Life Extenstion through Reconfiguration
NSF · $750k · 2009–2013
Nanoscale Coded Computation and Storage
NSF · $200k · 2007–2010
Frequent coauthors
- 20 shared
Raphael Rubin
University of Pennsylvania
- 18 shared
Nachiket Kapre
University of Waterloo
- 18 shared
Benjamin Gojman
California University of Pennsylvania
- 17 shared
Thomas F. Knight
Vanderbilt University
- 17 shared
John Wawrzynek
- 14 shared
Jonathan M. Smith
California University of Pennsylvania
- 14 shared
Nikil Mehta
California Institute of Technology
- 14 shared
Yuanlong Xiao
Labs
Implementation of Computation GroupPI
Not provided
Education
- 1992
Ph.D., Electrical Engineering and Computer Sciences
University of California, Berkeley
- 1988
M.S., Electrical Engineering and Computer Sciences
University of California, Berkeley
- 1986
B.S., Electrical Engineering and Computer Sciences
University of California, Berkeley
- Resume-aware match score
- Save to shortlist
- AI-drafted outreach
See your match with Andre DeHon
PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.
- Free to start
- No credit card
- 30-second signup