Caiwen Ding

Verified

University of Minnesota · Computer Science and Engineering

Active 2015–2026

h-index27

Citations2.7k

Papers229166 last 5y

Funding$214k

Faculty page Lab page Website

See your match with Caiwen Ding — sign in to PhdFit.Sign in

About

Caiwen Ding is an Associate Professor in the Department of Computer Science & Engineering at the University of Minnesota, Twin Cities. His research interests include algorithm-system co-design of ML/AI, computer architecture and heterogeneous computing, privacy-preserving machine learning, machine learning for electronic design automation, neuromorphic computing, computer vision, and natural language processing.

Research topics

Computer Science
Artificial Intelligence
Parallel computing
Computer engineering
Algorithm
Embedded system
Engineering
Electrical engineering
Computer architecture
Electronic engineering

Selected publications

FPTC: A Fast Parallel Transform-based Codec for Efficient Asymmetric Signal Compression
ArXiv.org · 2026-05-01
articleOpen accessSenior author
Modern high-performance computing and Internet-of-Things deployments increasingly generate large volumes of signal data that must be compressed efficiently on resource-constrained acquisition devices and decompressed at scale on centralized servers. Lossy compression is widely adopted to minimize storage and transmission costs on low-power hardware sensors, yet existing methods rarely optimize for both reconstruction quality and decompression throughput simultaneously, nor do they apply methods that generalize across signal domains. In this work, we introduce FPTC, a high-throughput asymmetric signal codec that pairs a lightweight sequential encoder with a massively parallel GPU decoder designed for server-side batch decompression. FPTC applies a windowed discrete cosine transform (DCT) to exploit frequency-domain sparsity, quantizes spectral coefficients with a hybrid three-zone mapping, and entropy codes the result using Huffman coding with a novel packing scheme. The pipeline used in FPTC is designed to be throughput oriented on the GPU, maximizing performance without sacrificing reconstruction quality. We evaluate FPTC on ten datasets spanning four signal domains: biomedical diagnostic, seismic reflections, power-grid production metrics, and meteorological recordings. Our results demonstrate that FPTC outperforms existing frameworks in compression ratio while maintaining competitive throughput, achieving multiplicative compression performance of 3.6x (power), 3.1x (meteorological), 1.5x (biomedical), and 1.2x (seismic) over existing frameworks.
Publisher OA PDF
FPTC: A Fast Parallel Transform-based Codec for Efficient Asymmetric Signal Compression
arXiv (Cornell University) · 2026-05-01
preprintOpen accessSenior author
Modern high-performance computing and Internet-of-Things deployments increasingly generate large volumes of signal data that must be compressed efficiently on resource-constrained acquisition devices and decompressed at scale on centralized servers. Lossy compression is widely adopted to minimize storage and transmission costs on low-power hardware sensors, yet existing methods rarely optimize for both reconstruction quality and decompression throughput simultaneously, nor do they apply methods that generalize across signal domains. In this work, we introduce FPTC, a high-throughput asymmetric signal codec that pairs a lightweight sequential encoder with a massively parallel GPU decoder designed for server-side batch decompression. FPTC applies a windowed discrete cosine transform (DCT) to exploit frequency-domain sparsity, quantizes spectral coefficients with a hybrid three-zone mapping, and entropy codes the result using Huffman coding with a novel packing scheme. The pipeline used in FPTC is designed to be throughput oriented on the GPU, maximizing performance without sacrificing reconstruction quality. We evaluate FPTC on ten datasets spanning four signal domains: biomedical diagnostic, seismic reflections, power-grid production metrics, and meteorological recordings. Our results demonstrate that FPTC outperforms existing frameworks in compression ratio while maintaining competitive throughput, achieving multiplicative compression performance of 3.6x (power), 3.1x (meteorological), 1.5x (biomedical), and 1.2x (seismic) over existing frameworks.
Publisher DOI
LLM-VeriPPA: Power, Performance, and Area Optimization aware Verilog Code Generation with Large Language Models
ArXiv.org · 2025-09-10
preprintOpen accessSenior author
Large Language Models (LLMs) are gaining prominence in various fields, thanks to their ability to generate high- quality content from human instructions. This paper delves into the field of chip design using LLMs, specifically in Power- Performance-Area (PPA) optimization and the generation of accurate Verilog codes for circuit designs. We introduce a novel framework VeriPPA designed to optimize PPA and generate Verilog code using LLMs. Our method includes a two-stage process where the first stage focuses on improving the functional and syntactic correctness of the generated Verilog codes, while the second stage focuses on optimizing the Verilog codes to meet PPA constraints of circuit designs, a crucial element of chip design. Our framework achieves an 81.37% success rate in syntactic correctness and 62.06% in functional correctness for code genera- tion, outperforming current state-of-the-art (SOTA) methods. On the RTLLM dataset. On the VerilogEval dataset, our framework achieves 99.56% syntactic correctness and 43.79% functional correctness, also surpassing SOTA, which stands at 92.11% for syntactic correctness and 33.57% for functional correctness. Furthermore, Our framework able to optimize the PPA of the designs. These results highlight the potential of LLMs in handling complex technical areas and indicate an encouraging development in the automation of chip design processes.
Publisher OA PDF DOI
Attacking all tasks at once using adversarial examples in multi-task learning
Neurocomputing · 2025-09-14
preprintOpen access
Publisher OA PDF DOI
Evaluating the Performance of Artificial Neural Networks with TiO <sub>2</sub> , ZnO, and HfO <sub>2</sub> Memristors: Ideal, Degraded, and Refresh-Enhanced States
Selected topics in electornics and systems · 2025-12-17
book-chapter
Publisher DOI
Beyond Code Pairs: Dialogue-Based Data Generation for LLM Code Translation
ArXiv.org · 2025-11-29
preprintOpen access
Large language models (LLMs) have shown remarkable capabilities in code translation, yet their performance deteriorates in low-resource programming domains such as Fortran and emerging frameworks like CUDA, where high-quality parallel data are scarce. We present an automated dataset generation pipeline featuring a dual-LLM Questioner-Solver design that incorporates external knowledge from compilers and runtime feedback. Beyond traditional source-target code pair datasets, our approach additionally generates (1) verified translations with unit tests for assessing functional consistency, and (2) multi-turn dialogues that capture the reasoning process behind translation refinement. Applied to Fortran -> C++ and C++ -> CUDA, the pipeline yields 3.64k and 3.93k dialogues, respectively. Fine-tuning on this data yields dramatic improvements in functional correctness, boosting unit test success rates by over 56% on the challenging C++-to-CUDA task. We show this data enables a 7B open-weight model to significantly outperform larger proprietary systems on key metrics like compilation success.
Publisher OA PDF DOI
Hardware Architecture for Convolutional Neural Network with Memristor-Bridges
Selected topics in electornics and systems · 2025-12-17
book-chapter
Publisher DOI
Hardware Architecture for Convolutional Neural Network with Memristor-Bridges
International Journal of High Speed Electronics and Systems · 2025-07-23
article
Integration of memristors into neuromorphic systems is receiving substantial attention due to their potential to facilitate energy-efficient and highly parallel in-memory computation. In this paper, a memristor-bridge based design for the convolution operation of convolution neural network (CNN) and its crossbar realization are developed. A LeNet-5 network is realized using the proposed design and tested on the MNIST dataset. The architecture includes circuit configurations for activation and pooling operations. The weight-mapping procedure for the memristor-bridges is developed in relation to the exact physics of conduction mechanism of memristor. Efficient modeling of the devices results in excellent performance of the network, achieving up to 99.08% inference accuracy. Isolation among the bridges and parallelization of the convolution operation leads to a rapid mapping within 0.11[Formula: see text][Formula: see text] and fast response in less than 20[Formula: see text][Formula: see text]. The overall energy consumption by the memristor units during mapping and inference remains well below [Formula: see text].
Publisher DOI
Attacking the spike: On the security of spiking neural networks to adversarial examples
Neurocomputing · 2025-09-12 · 2 citations
articleOpen access
Publisher DOI
Graph Convolutional Network Acceleration Using Adiabatic Superconductor Josephson Devices
2025-06-08
articleOpen accessSenior author
Publisher DOI

Recent grants

CAREER: Algorithm-Hardware Co-design of Efficient Large Graph Machine Learning for Electronic Design Automation
NSF · $214k · 2024–2025

Frequent coauthors

Yanzhi Wang
Sichuan University
74 shared
Hongwu Peng
34 shared
Shanglin Zhou
32 shared
Qinru Qiu
24 shared
Bo Yuan
Zhejiang University of Science and Technology
24 shared
Xuehai Qian
Purdue University System
24 shared
Geng Yuan
24 shared
Chenghong Wang
23 shared

Labs

UMN APEX (Algorithm-Platform Exploration for Efficient AI) LabPI
Research in ML/AI systems, computer architecture, privacy-preserving ML, and EDA.

Education

Ph.D.
Northeastern University
2019

Awards & honors

NSF CAREER Award
Amazon Research Award
CISCO Research Award
Best Paper Award at 2025 ICLAD
Outstanding Student Paper Award at 2023 HPEC

Resume-aware match score
Save to shortlist
AI-drafted outreach

See your match with Caiwen Ding

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

Join the waitlist How it works

Free to start
No credit card
30-second signup

Find professors who actually fit you