Resume-aware faculty matching

Find professors who actually fit you

Upload your resume. Four AI agents analyze your background, rank the faculty who fit, inspect their recent research, and help you draft outreach — grounded in their actual work, not templates.

Free to startNo credit cardCancel anytime
Top matches Balanced preset
Dr. Sarah Chen
Stanford · Interpretability · NLP
91
Dr. Marcus Holloway
MIT · Robotics · RL
84
Dr. Aisha Okonkwo
CMU · Fairness · HCI
82
Nova · Professor Researcher · re-ranking top 20…
Caiwen Ding

Caiwen Ding

Verified

University of Minnesota · Computer Science and Engineering

Active 2015–2026

h-index27
Citations2.7k
Papers229166 last 5y
Funding$214k
See your match with Caiwen Ding — sign in to PhdFit.Sign in

About

Caiwen Ding is an Associate Professor in the Department of Computer Science & Engineering at the University of Minnesota, Twin Cities. His research interests include algorithm-system co-design of ML/AI, computer architecture and heterogeneous computing, privacy-preserving machine learning, machine learning for electronic design automation, neuromorphic computing, computer vision, and natural language processing.

Research topics

  • Computer Science
  • Artificial Intelligence
  • Parallel computing
  • Computer engineering
  • Algorithm
  • Embedded system
  • Engineering
  • Electrical engineering
  • Computer architecture
  • Electronic engineering

Selected publications

  • FPTC: A Fast Parallel Transform-based Codec for Efficient Asymmetric Signal Compression

    ArXiv.org · 2026-05-01

    articleOpen accessSenior author

    Modern high-performance computing and Internet-of-Things deployments increasingly generate large volumes of signal data that must be compressed efficiently on resource-constrained acquisition devices and decompressed at scale on centralized servers. Lossy compression is widely adopted to minimize storage and transmission costs on low-power hardware sensors, yet existing methods rarely optimize for both reconstruction quality and decompression throughput simultaneously, nor do they apply methods that generalize across signal domains. In this work, we introduce FPTC, a high-throughput asymmetric signal codec that pairs a lightweight sequential encoder with a massively parallel GPU decoder designed for server-side batch decompression. FPTC applies a windowed discrete cosine transform (DCT) to exploit frequency-domain sparsity, quantizes spectral coefficients with a hybrid three-zone mapping, and entropy codes the result using Huffman coding with a novel packing scheme. The pipeline used in FPTC is designed to be throughput oriented on the GPU, maximizing performance without sacrificing reconstruction quality. We evaluate FPTC on ten datasets spanning four signal domains: biomedical diagnostic, seismic reflections, power-grid production metrics, and meteorological recordings. Our results demonstrate that FPTC outperforms existing frameworks in compression ratio while maintaining competitive throughput, achieving multiplicative compression performance of 3.6x (power), 3.1x (meteorological), 1.5x (biomedical), and 1.2x (seismic) over existing frameworks.

  • FPTC: A Fast Parallel Transform-based Codec for Efficient Asymmetric Signal Compression

    arXiv (Cornell University) · 2026-05-01

    preprintOpen accessSenior author

    Modern high-performance computing and Internet-of-Things deployments increasingly generate large volumes of signal data that must be compressed efficiently on resource-constrained acquisition devices and decompressed at scale on centralized servers. Lossy compression is widely adopted to minimize storage and transmission costs on low-power hardware sensors, yet existing methods rarely optimize for both reconstruction quality and decompression throughput simultaneously, nor do they apply methods that generalize across signal domains. In this work, we introduce FPTC, a high-throughput asymmetric signal codec that pairs a lightweight sequential encoder with a massively parallel GPU decoder designed for server-side batch decompression. FPTC applies a windowed discrete cosine transform (DCT) to exploit frequency-domain sparsity, quantizes spectral coefficients with a hybrid three-zone mapping, and entropy codes the result using Huffman coding with a novel packing scheme. The pipeline used in FPTC is designed to be throughput oriented on the GPU, maximizing performance without sacrificing reconstruction quality. We evaluate FPTC on ten datasets spanning four signal domains: biomedical diagnostic, seismic reflections, power-grid production metrics, and meteorological recordings. Our results demonstrate that FPTC outperforms existing frameworks in compression ratio while maintaining competitive throughput, achieving multiplicative compression performance of 3.6x (power), 3.1x (meteorological), 1.5x (biomedical), and 1.2x (seismic) over existing frameworks.

  • LLM-VeriPPA: Power, Performance, and Area Optimization aware Verilog Code Generation with Large Language Models

    ArXiv.org · 2025-09-10

    preprintOpen accessSenior author

    Large Language Models (LLMs) are gaining prominence in various fields, thanks to their ability to generate high- quality content from human instructions. This paper delves into the field of chip design using LLMs, specifically in Power- Performance-Area (PPA) optimization and the generation of accurate Verilog codes for circuit designs. We introduce a novel framework VeriPPA designed to optimize PPA and generate Verilog code using LLMs. Our method includes a two-stage process where the first stage focuses on improving the functional and syntactic correctness of the generated Verilog codes, while the second stage focuses on optimizing the Verilog codes to meet PPA constraints of circuit designs, a crucial element of chip design. Our framework achieves an 81.37% success rate in syntactic correctness and 62.06% in functional correctness for code genera- tion, outperforming current state-of-the-art (SOTA) methods. On the RTLLM dataset. On the VerilogEval dataset, our framework achieves 99.56% syntactic correctness and 43.79% functional correctness, also surpassing SOTA, which stands at 92.11% for syntactic correctness and 33.57% for functional correctness. Furthermore, Our framework able to optimize the PPA of the designs. These results highlight the potential of LLMs in handling complex technical areas and indicate an encouraging development in the automation of chip design processes.

  • Attacking all tasks at once using adversarial examples in multi-task learning

    Neurocomputing · 2025-09-14

    preprintOpen access
  • Evaluating the Performance of Artificial Neural Networks with TiO <sub>2</sub> , ZnO, and HfO <sub>2</sub> Memristors: Ideal, Degraded, and Refresh-Enhanced States

    Selected topics in electornics and systems · 2025-12-17

    book-chapter
  • Beyond Code Pairs: Dialogue-Based Data Generation for LLM Code Translation

    ArXiv.org · 2025-11-29

    preprintOpen access

    Large language models (LLMs) have shown remarkable capabilities in code translation, yet their performance deteriorates in low-resource programming domains such as Fortran and emerging frameworks like CUDA, where high-quality parallel data are scarce. We present an automated dataset generation pipeline featuring a dual-LLM Questioner-Solver design that incorporates external knowledge from compilers and runtime feedback. Beyond traditional source-target code pair datasets, our approach additionally generates (1) verified translations with unit tests for assessing functional consistency, and (2) multi-turn dialogues that capture the reasoning process behind translation refinement. Applied to Fortran -&gt; C++ and C++ -&gt; CUDA, the pipeline yields 3.64k and 3.93k dialogues, respectively. Fine-tuning on this data yields dramatic improvements in functional correctness, boosting unit test success rates by over 56% on the challenging C++-to-CUDA task. We show this data enables a 7B open-weight model to significantly outperform larger proprietary systems on key metrics like compilation success.

  • Hardware Architecture for Convolutional Neural Network with Memristor-Bridges

    Selected topics in electornics and systems · 2025-12-17

    book-chapter
  • Hardware Architecture for Convolutional Neural Network with Memristor-Bridges

    International Journal of High Speed Electronics and Systems · 2025-07-23

    article

    Integration of memristors into neuromorphic systems is receiving substantial attention due to their potential to facilitate energy-efficient and highly parallel in-memory computation. In this paper, a memristor-bridge based design for the convolution operation of convolution neural network (CNN) and its crossbar realization are developed. A LeNet-5 network is realized using the proposed design and tested on the MNIST dataset. The architecture includes circuit configurations for activation and pooling operations. The weight-mapping procedure for the memristor-bridges is developed in relation to the exact physics of conduction mechanism of memristor. Efficient modeling of the devices results in excellent performance of the network, achieving up to 99.08% inference accuracy. Isolation among the bridges and parallelization of the convolution operation leads to a rapid mapping within 0.11[Formula: see text][Formula: see text] and fast response in less than 20[Formula: see text][Formula: see text]. The overall energy consumption by the memristor units during mapping and inference remains well below [Formula: see text].

  • Attacking the spike: On the security of spiking neural networks to adversarial examples

    Neurocomputing · 2025-09-12 · 2 citations

    articleOpen access
  • Graph Convolutional Network Acceleration Using Adiabatic Superconductor Josephson Devices

    2025-06-08

    articleOpen accessSenior author

Recent grants

Frequent coauthors

  • Yanzhi Wang

    Sichuan University

    74 shared
  • Hongwu Peng

    34 shared
  • Shanglin Zhou

    32 shared
  • Qinru Qiu

    24 shared
  • Bo Yuan

    Zhejiang University of Science and Technology

    24 shared
  • Xuehai Qian

    Purdue University System

    24 shared
  • Geng Yuan

    24 shared
  • Chenghong Wang

    23 shared

Labs

Education

  • Ph.D.

    Northeastern University

    2019

Awards & honors

  • NSF CAREER Award
  • Amazon Research Award
  • CISCO Research Award
  • Best Paper Award at 2025 ICLAD
  • Outstanding Student Paper Award at 2023 HPEC
  • Resume-aware match score
  • Save to shortlist
  • AI-drafted outreach

See your match with Caiwen Ding

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

  • Free to start
  • No credit card
  • 30-second signup