Resume-aware faculty matching

Find professors who actually fit you

Upload your resume. Four AI agents analyze your background, rank the faculty who fit, inspect their recent research, and help you draft outreach — grounded in their actual work, not templates.

Free to startNo credit cardCancel anytime
Top matches Balanced preset
Dr. Sarah Chen
Stanford · Interpretability · NLP
91
Dr. Marcus Holloway
MIT · Robotics · RL
84
Dr. Aisha Okonkwo
CMU · Fairness · HCI
82
Nova · Professor Researcher · re-ranking top 20…
Kevin Chenchuan Chang

Kevin Chenchuan Chang

· ProfessorVerified

University of Illinois Urbana-Champaign · Computer Science

Active 1996–2025

h-index45
Citations10.3k
Papers24485 last 5y
Funding$3.4M
See your match with Kevin Chenchuan Chang — sign in to PhdFit.Sign in

About

Kevin Chen-Chuan Chang is a Professor in the Department of Computer Science at the University of Illinois at Urbana-Champaign. He received a Bachelor of Science degree from National Taiwan University and a Ph.D. in Electrical Engineering from Stanford University in 2001. His research addresses large-scale information access and knowledge acquisition, focusing on search, mining, and integration across structured and unstructured big data. His current research emphasizes Web search and mining, as well as social media analytics. He leads the FORWARD Data Lab group within the Data and Information Systems Laboratories at UIUC. His work aims to bridge structured and unstructured data to enable semantic-rich access to vast amounts of information. His research spans natural language processing, data mining, data management, information retrieval, and machine learning, with a focus on applications in Web and social media-based knowledge organization. He has received numerous awards, including the ICDE 10-Year Test of Time Award in 2022, NSF CAREER Award in 2002, and multiple teaching awards at the University of Illinois. He has also co-founded a startup, Cazoodle, and developed GrantForward.com, a funding discovery service used by over 200 institutions.

Research topics

  • Computer Science
  • Artificial Intelligence
  • Mathematics
  • Theoretical computer science
  • Management science
  • Combinatorics
  • Cognitive science
  • Psychology
  • Engineering

Selected publications

  • Understanding Cross-Domain Adaptation in Low-Resource Topic Modeling

    ArXiv.org · 2025-06-09

    preprintOpen accessSenior author

    Topic modeling plays a vital role in uncovering hidden semantic structures within text corpora, but existing models struggle in low-resource settings where limited target-domain data leads to unstable and incoherent topic inference. We address this challenge by formally introducing domain adaptation for low-resource topic modeling, where a high-resource source domain informs a low-resource target domain without overwhelming it with irrelevant content. We establish a finite-sample generalization bound showing that effective knowledge transfer depends on robust performance in both domains, minimizing latent-space discrepancy, and preventing overfitting to the data. Guided by these insights, we propose DALTA (Domain-Aligned Latent Topic Adaptation), a new framework that employs a shared encoder for domain-invariant features, specialized decoders for domain-specific nuances, and adversarial alignment to selectively transfer relevant information. Experiments on diverse low-resource datasets demonstrate that DALTA consistently outperforms state-of-the-art methods in terms of topic coherence, stability, and transferability.

  • Conversion of the Treatise on Invertebrate Paleontology volumes into a FAIR database

    Abstracts with programs - Geological Society of America · 2025-01-01

    article
  • Understanding Cross-Domain Adaptation in Low-Resource Topic Modeling

    2025-01-01

    articleOpen accessSenior author

    Topic modeling plays a vital role in uncovering hidden semantic structures within text corpora, but existing models struggle in lowresource settings where limited target-domain data leads to unstable and incoherent topic inference.We address this challenge by formally introducing domain adaptation for lowresource topic modeling, where a high-resource source domain informs a low-resource target domain without overwhelming it with irrelevant content.We establish a finite-sample generalization bound showing that effective knowledge transfer depends on robust performance in both domains, minimizing latent-space discrepancy, and preventing overfitting to the data.Guided by these insights, we propose DALTA (Domain-Aligned Latent Topic Adaptation), a new framework that employs a shared encoder for domain-invariant features, specialized decoders for domain-specific nuances, and adversarial alignment to selectively transfer relevant information.Experiments on diverse lowresource datasets demonstrate that DALTA consistently outperforms state-of-the-art methods in terms of topic coherence, stability, and transferability.

  • ERU-KG: Efficient Reference-aligned Unsupervised Keyphrase Generation

    2025-01-01

    articleOpen accessSenior author

    Unsupervised keyphrase prediction has gained growing interest in recent years.However, existing methods typically rely on heuristically defined importance scores, which may lead to inaccurate informativeness estimation.In addition, they lack consideration for time efficiency.To solve these problems, we propose ERU-KG, an unsupervised keyphrase generation (UKG) model that consists of an informativeness and a phraseness module.The former estimates the relevance of keyphrase candidates, while the latter generate those candidates.The informativeness module innovates by learning to model informativeness through references (e.g., queries, citation contexts, and titles) and at the term-level, thereby 1) capturing how the key concepts of documents are perceived in different contexts and 2) estimating informativeness of phrases more efficiently by aggregating term informativeness, removing the need for explicit modeling of the candidates.ERU-KG demonstrates its effectiveness on keyphrase generation benchmarks by outperforming unsupervised baselines and achieving on average 89% of the performance of a supervised model for top 10 predictions.Additionally, to highlight its practical utility, we evaluate the model on text retrieval tasks and show that keyphrases generated by ERU-KG are effective when employed as query and document expansions.Furthermore, inference speed tests reveal that ERU-KG is the fastest among baselines of similar model sizes.Finally, our proposed model can switch between keyphrase generation and extraction by adjusting hyperparameters, catering to diverse application requirements. 1

  • MuSha: Subgraph Matching by Multilevel Sharing

    2025-05-19

    article

    Subgraph matching (SM) is a fundamental problem in graph data analysis. Real-world patterns used in graph analysis are often symmetric and contain isomorphic substructures, but existing SM algorithms fail to explore such properties. To fill this gap, we propose MuSha, a multi-objective optimization framework for SM, leveraging multilevel sharing of isomorphic substructure results to speed up SM and symmetry breaking to avoid directly computing symmetric results. To efficiently compute and cache intermediate results for sharing, MuSha applies worst-case optimal joins (WCOJs) and utilizes trie data structures to compress and index results. To enable multilevel sharing, MuSha solves a multi-objective optimization problem involving pattern decomposition, symmetry breaking, WCOJ orders, and trie structural orders. Experimental results demonstrate that MuSha outperforms the state of the art by up to two orders of magnitude on graphs of millions of vertices.

  • Query Optimization for Parametric Knowledge Refinement in Retrieval-Augmented Large Language Models

    2025-01-01

    articleOpen accessSenior author

    We introduce the Extract-Refine-Retrieve-Read (ERRR) framework, a novel approach designed to bridge the pre-retrieval information gap in Retrieval-Augmented Generation (RAG) systems through query optimization tailored to meet the specific knowledge requirements of Large Language Models (LLMs).Unlike conventional query optimization techniques used in RAG, the ERRR framework begins by extracting parametric knowledge from LLMs, followed by using a specialized query optimizer for refining these queries.This process ensures the retrieval of only the most pertinent information essential for generating accurate responses.Moreover, to enhance flexibility and reduce computational costs, we propose a trainable scheme for our pipeline that utilizes a smaller, tunable model as the query optimizer, which is refined through knowledge distillation from a larger teacher model.Our evaluations on various question-answering (QA) datasets and with different retrieval systems show that ERRR consistently outperforms existing baselines, proving to be a versatile and cost-effective module for improving the utility and accuracy of RAG systems.

  • CASPER: Concept-integrated Sparse Representation for Scientific Retrieval

    ArXiv.org · 2025-08-18

    preprintOpen accessSenior author

    Identifying relevant research concepts is crucial for effective scientific search. However, primary sparse retrieval methods often lack concept-aware representations. To address this, we propose CASPER, a sparse retrieval model for scientific search that utilizes both tokens and keyphrases as representation units (i.e., dimensions in the sparse embedding space). This enables CASPER to represent queries and documents via research concepts and match them at both granular and conceptual levels. Furthermore, we construct training data by leveraging abundant scholarly references (including titles, citation contexts, author-assigned keyphrases, and co-citations), which capture how research concepts are expressed in diverse settings. Empirically, CASPER outperforms strong dense and sparse retrieval baselines across eight scientific retrieval benchmarks. We also explore the effectiveness-efficiency trade-off via representation pruning and demonstrate CASPER's interpretability by showing that it can serve as an effective and efficient keyphrase generation model.

  • RL-based Query Rewriting with Distilled LLM for online E-Commerce Systems

    ArXiv.org · 2025-01-29

    preprintOpen accessSenior author

    Query rewriting (QR) is a critical technique in e-commerce search, addressing the lexical gap between user queries and product descriptions to enhance search performance. Existing QR approaches typically fall into two categories: discriminative models and generative methods leveraging large language models (LLMs). Discriminative models often struggle with natural language understanding and offer limited flexibility in rewriting, while generative LLMs, despite producing high-quality rewrites, face high inference latency and cost in online settings. These limitations force offline deployment, making them vulnerable to issues like information staleness and semantic drift. To overcome these challenges, we propose a novel hybrid pipeline for QR that balances efficiency and effectiveness. Our approach combines offline knowledge distillation to create a lightweight but efficient student model with online reinforcement learning (RL) to refine query rewriting dynamically using real-time feedback. A key innovation is the use of LLMs as simulated human feedback, enabling scalable reward signals and cost-effective evaluation without manual annotations. Experimental results on Amazon ESCI dataset demonstrate significant improvements in query relevance, diversity, and adaptability, as well as positive feedback from the LLM simulation. This work contributes to advancing LLM capabilities for domain-specific applications, offering a robust solution for dynamic and complex e-commerce search environments.

  • Social Media and Orthopaedics: Establishing Your Online Reputation.

    PubMed · 2025-01-01

    review1st authorCorresponding

    With the rise of internet and social media usage in the 21st century, patients have increasingly been looking to online resources for information regarding their health care. It is imperative for physicians to recognize the trends and role of these tools in clinical orthopaedic practice, and to harness these tools to educate users, connect with other physicians, and interact with current and potential patients. It is important to review the current literature regarding social media in orthopaedics; some commonly used social media platforms and their individual characteristics; and general guidelines for creating content and managing an online reputation.

  • MiniELM: A Lightweight and Adaptive Query Rewriting Framework for E-Commerce Search Optimization

    2025-01-01

    articleOpen accessSenior author

    Query rewriting (QR) is a critical technique in e-commerce search, addressing the lexical gap between user queries and product descriptions to enhance search performance.Existing QR approaches typically fall into two categories: discriminative models and generative methods leveraging large language models (LLMs).Discriminative models often struggle with natural language understanding and offer limited flexibility in rewriting, while generative LLMs, despite producing high-quality rewrites, face high inference latency and cost in online settings.These limitations force offline deployment, making them vulnerable to issues like information staleness and semantic drift.To overcome these challenges, we propose a novel hybrid pipeline for QR that balances efficiency and effectiveness.Our approach combines offline knowledge distillation to create a lightweight but efficient student model with online reinforcement learning (RL) to refine query rewriting dynamically using real-time feedback.A key innovation is the use of LLMs as simulated human feedback, enabling scalable reward signals and cost-effective evaluation without manual annotations.Experimental results on Amazon ESCI dataset demonstrate significant improvements in query relevance, diversity, and adaptability, as well as positive feedback from the LLM simulation.This work contributes to advancing LLM capabilities for domain-specific applications, offering a robust solution for dynamic and complex e-commerce search environments.

Recent grants

Frequent coauthors

  • Jie Huang

    Chinese University of Hong Kong

    32 shared
  • Vincent W. Zheng

    Rutgers, The State University of New Jersey

    25 shared
  • Wen‐mei Hwu

    University of Illinois Urbana-Champaign

    18 shared
  • Pritom Saha Akash

    16 shared
  • Bin He

    University of Illinois Urbana-Champaign

    15 shared
  • Jinjun Xiong

    15 shared
  • Yuan Fang

    Singapore Management University

    12 shared
  • Jie Huang

    11 shared

Education

  • Ph.D., Computer Science

    University of Illinois at Urbana-Champaign

    2005
  • M.S., Computer Science

    University of Illinois at Urbana-Champaign

    2001
  • B.S., Computer Science

    University of Science and Technology of China

    1998

Awards & honors

  • ICDE 10-Year Test of Time Award (2022)
  • Best Paper Selection/Awards in VLDB 2000 and 2013
  • ASONAM 2019
  • NSF CAREER Award (2002)
  • NCSA Faculty Fellow Award (2003)
  • Resume-aware match score
  • Save to shortlist
  • AI-drafted outreach

See your match with Kevin Chenchuan Chang

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

  • Free to start
  • No credit card
  • 30-second signup