ChengXiang Zhai

· Donald Biggar Willett Professor in EngineeringVerified

University of Illinois Urbana-Champaign · Computer Science

Active 1990–2026

h-index83

Citations29.8k

Papers553126 last 5y

Funding$1.9M

Faculty page

See your match with ChengXiang Zhai — sign in to PhdFit.Sign in

About

ChengXiang Zhai is the Donald Biggar Willett Professor in Engineering at the University of Illinois Urbana-Champaign, affiliated with the Siebel School of Computing and Data Science. His research areas include Artificial Intelligence, Bioinformatics and Computational Biology, Computers and Education, and Data and Information Systems. He has received numerous awards for his research and teaching, including the ACM SIGIR Gerard Salton Award in 2021, the ACM SIGIR Academy Membership in 2020, and the Presidential Early Career Award for Scientists and Engineers in 2004. Zhai has also been recognized for excellence in graduate student mentoring and undergraduate advising, and has received multiple teaching awards. His professional contributions are distinguished by his leadership in research and education within the field of computing and data science.

Research topics

Computer Science
Artificial Intelligence
Machine Learning
Engineering
Data Mining
Natural Language Processing
Computational biology
Biology
Epistemology
Philosophy
Biochemical engineering
Linguistics
Data science

Selected publications

Globally Optimal Training of Spiking Neural Networks via Parameter Reconstruction
ArXiv.org · 2026-05-08
articleOpen accessSenior author
Spiking Neural Networks (SNNs) have been proposed as biologically plausible and energy-efficient alternatives to conventional Artificial Neural Networks (ANNs). However, the training of SNN usually relies on surrogate gradients due to the non-differentiability of the spike function, introducing approximation errors that accumulate across layers. To address this challenge, we extend the work on convexification of parallel feedforward threshold networks to parallel recurrent threshold networks, which subsume parallel SNNs as a structured special case. Building on this theoretical framework, we propose a parameter reconstruction algorithm for SNN training that demonstrates consistent and significant advantages across various tasks, both as a standalone method and in combination with surrogate-gradient training. The ablations further demonstrate the data scalability and robustness to model configurations of our training algorithm, pointing toward its potential in large-scale SNN training.
Publisher OA PDF
Globally Optimal Training of Spiking Neural Networks via Parameter Reconstruction
arXiv (Cornell University) · 2026-05-08
preprintOpen accessSenior author
Spiking Neural Networks (SNNs) have been proposed as biologically plausible and energy-efficient alternatives to conventional Artificial Neural Networks (ANNs). However, the training of SNN usually relies on surrogate gradients due to the non-differentiability of the spike function, introducing approximation errors that accumulate across layers. To address this challenge, we extend the work on convexification of parallel feedforward threshold networks to parallel recurrent threshold networks, which subsume parallel SNNs as a structured special case. Building on this theoretical framework, we propose a parameter reconstruction algorithm for SNN training that demonstrates consistent and significant advantages across various tasks, both as a standalone method and in combination with surrogate-gradient training. The ablations further demonstrate the data scalability and robustness to model configurations of our training algorithm, pointing toward its potential in large-scale SNN training.
Publisher DOI
SimLab: A Platform for Simulation-based Evaluation of Conversational Information Access Systems
ArXiv.org · 2025-07-07
preprintOpen accessSenior author
Progress in conversational information access (CIA) systems has been hindered by the difficulty of evaluating such systems with reproducible experiments. While user simulation offers a promising solution, the lack of infrastructure and tooling to support this evaluation paradigm remains a significant barrier. To address this gap, we introduce SimLab, the first cloud-based platform providing a centralized solution for the community to benchmark both conversational systems and user simulators in a controlled and reproducible setting. We articulate the requirements for such a platform and propose a general infrastructure to meet them. We then present the design and implementation of an initial version of SimLab and showcase its features through an initial simulation-based evaluation task in conversational movie recommendation. Furthermore, we discuss the platform's sustainability and future opportunities for development, inviting the community to drive further progress in the fields of CIA and user simulation.
Publisher OA PDF DOI
InstInfo: A Just-in-Time Literature Recommendation System for Presentations
2025-07-13
articleOpen accessSenior author
The efficient discovery of academic literature is critical for research progress, yet many researchers have difficulties in finding literature. This work proposes InstInfo: a novel just-in-time literature recommendation system for presentations. InstInfo transcribes audio in real-time and recommends literature according to the ideas being discussed, thereby helping researchers ground presentations in academic literature while saving them the time of having to manually search. Informal usability studies show that InstInfo is easy to use and that researchers find value in the recommendations. InstInfo can be accessed at https://instinfo.com.
Publisher OA PDF DOI
Knowledge-Centered Dual-Process Reasoning for Math Word Problems With Large Language Models
IEEE Transactions on Knowledge and Data Engineering · 2025-04-01 · 6 citations
article
Math word problem (MWP) serves as a critical milestone for assessing the text mining ability and knowledge mastery level of models. Recent advancements have witnessed large language models (LLMs) showcasing remarkable performance on MWP. However, current LLMs still frequently exhibit logical errors, which highlights their inability to fully grasp the knowledge required for genuine step-by-step mathematical reasoning. To this end, in this paper, we propose a novel Knowledge-guided Solver (KNOS) framework that empowers LLMs to simulate human mathematical reasoning, whose core idea is to <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Invoke-Verify-Inject necessary knowledge to solve MWP. We draw inspiration from the dual-process theory to construct two cooperative systems: a <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Knowledge System and an <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Inference System. Specifically, the <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Knowledge System employs LLMs as the knowledge base and develops a novel <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">knowledge invoker that can elicit their relevant knowledge to support the strict step-level mathematical reasoning. In the <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Inference System, we propose a <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">knowledge verifier and a <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">knowledge injector to evaluate the knowledge rationality and further guide the step-wise symbolic deduction in an interpretable manner based on human cognitive mechanism, respectively. Moreover, to tackle the potential scarcity issue of mathematics-specific knowledge in LLMs, we consider an open-book exam scenario and propose an improved version of KNOS called EKNOS. In EKNOS, we meticulously design <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">knowledge selectors to extract the most relevant commonsense and math formulas from external knowledge sources for each reasoning step. This knowledge is utilized to assist the <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">knowledge invoker in better stimulating LLMs’ reasoning abilities. Both KNOS and EKNOS are flexible to empower different LLMs. Our experiments with GPT3, ChatGPT, and GPT4 not only demonstrate their reasoning accuracy improvement but also show how they bring the strict step-wise interpretability of mathematical thinking.
Publisher DOI
The Law of Knowledge Overshadowing: Towards Understanding, Predicting, and Preventing LLM Hallucination
ArXiv.org · 2025-02-22 · 1 citations
preprintOpen access
Hallucination is a persistent challenge in large language models (LLMs), where even with rigorous quality control, models often generate distorted facts. This paradox, in which error generation continues despite high-quality training data, calls for a deeper understanding of the underlying LLM mechanisms. To address it, we propose a novel concept: knowledge overshadowing, where model's dominant knowledge can obscure less prominent knowledge during text generation, causing the model to fabricate inaccurate details. Building on this idea, we introduce a novel framework to quantify factual hallucinations by modeling knowledge overshadowing. Central to our approach is the log-linear law, which predicts that the rate of factual hallucination increases linearly with the logarithmic scale of (1) Knowledge Popularity, (2) Knowledge Length, and (3) Model Size. The law provides a means to preemptively quantify hallucinations, offering foresight into their occurrence even before model training or inference. Built on overshadowing effect, we propose a new decoding strategy CoDa, to mitigate hallucinations, which notably enhance model factuality on Overshadow (27.9%), MemoTrap (13.1%) and NQ-Swap (18.3%). Our findings not only deepen understandings of the underlying mechanisms behind hallucinations but also provide actionable insights for developing more predictable and controllable language models.
Publisher OA PDF DOI
Learning to Slice: Self-Supervised Interpretable Hierarchical Representation Learning with Graph Auto-Encoder Tree
2025-08-01
articleOpen access
The perceptions and decisions of individuals on social networks are deeply rooted in their intrinsic beliefs, which makes it possible to infer social beliefs from user behavior and message interactions. While existing research models these interactions as graphs and learns their representations, interpretability remains a significant challenge. In real-world scenarios, the interpretation of beliefs is nested within subject scopes of different granularity (such as topics and locations), posing additional challenges for belief discovery. In this paper, we introduce the Interpretable Graph Auto-Encoder Tree (IGAT), a novel end-to-end framework that jointly encodes hierarchical subject scopes and corresponding beliefs as a unified, interpretable hierarchical representation. IGAT integrates the interpretable hierarchy of Model Trees with disentangled representation learning models. We propose a differentiable Slice Mechanism to dynamically optimize internal node splitting and jointly train a leaf model to learn disentangled belief subspaces. The aggregation of these subspaces yields a unified representation, offering interpretations for both subjects and beliefs. Experimental evaluations on three real-world Twitter datasets show that IGAT achieves a consistent improvement of 1.49%-5.61% in F1-score, accuracy, and purity in the belief discovery task, as well as its effectiveness in various downstream analytical applications.
Publisher DOI
Interactive Information Need Prediction with Intent and Context
arXiv (Cornell University) · 2025-01-05
preprintOpen accessSenior author
The ability to predict a user's information need would have wide-ranging implications, from saving time and effort to mitigating vocabulary gaps. We study how to interactively predict a user's information need by letting them select a pre-search context (e.g., a paragraph, sentence, or singe word) and specify an optional partial search intent (e.g., "how", "why", "applications", etc.). We examine how various generative language models can explicitly make this prediction by generating a question as well as how retrieval models can implicitly make this prediction by retrieving an answer. We find that this prediction process is possible in many cases and that user-provided partial search intent can help mitigate large pre-search contexts. We conclude that this framework is promising and suitable for real-world applications.
Publisher OA PDF DOI
Cache-of-Thought: Master-Apprentice Framework for Cost-Effective Vision Language Model Reasoning
2025-01-01
articleOpen access
Mingyuan Wu, Jize Jiang, Haozhen Zheng, Meitang Li, Zhaoheng Li, Beitong Tian, Bo Chen, Yongjoo Park, Minjia Zhang, ChengXiang Zhai, Klara Nahrstedt. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing. 2025.
Publisher OA PDF DOI
Information Retrieval for Artificial General Intelligence: A New Perspective of Information Retrieval Research
2025-07-13
article1st authorCorresponding
Publisher DOI

Recent grants

III-COR: QueryClinic: Improve Search Accuracy for Difficult Queries
NSF · $397k · 2007–2011
CDI-Type II: Collaborative Research: Joint Image-Text Parsing and Reasoning for Analyzing Social and Political News Events
NSF · $500k · 2010–2015
CAREER: User-centered Adaptive Information Retrieval
NSF · $522k · 2004–2010
SaTC: CORE: Medium: Collaborative: Understanding and Discovering Illicit Online Business Through Automatic Analysis of Online Text Traces
NSF · $300k · 2018–2023
RI: Multi-Faceted Comparative Text Summarization
NSF · $200k · 2007–2010

Frequent coauthors

Jiawei Han
University of Illinois Urbana-Champaign
26 shared
Hui Fang
First Affiliated Hospital of Jiangxi Medical College
26 shared
Sean Massung
25 shared
Qiaozhu Mei
24 shared
Heng Ji
22 shared
Shanfeng Zhu
Fudan University
18 shared
Shengwen Peng
Fudan University
18 shared
Hiroshi Mamitsuka
Kyoto University
18 shared

Labs

Siebel School of Computing and Data SciencePI

Education

Ph.D., Computer Science
University of Illinois at Urbana-Champaign
2003
M.S., Computer Science
University of Illinois at Urbana-Champaign
1999
B.S., Computer Science
University of Science and Technology of China
1996

Awards & honors

Campus Award for Excellence in Graduate Student Mentoring, U…
Rose Award for Teaching Excellence, College of Engineering,…
ACM SIGIR Gerard Salton Award (2021)
ACM SIGIR Academy Member (2020)
Donald Biggar Willett Professor in Engineering (2018)

Resume-aware match score
Save to shortlist
AI-drafted outreach

See your match with ChengXiang Zhai

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

Join the waitlist How it works

Free to start
No credit card
30-second signup

Find professors who actually fit you