Resume-aware faculty matching

Find professors who actually fit you

Upload your resume. Four AI agents analyze your background, rank the faculty who fit, inspect their recent research, and help you draft outreach — grounded in their actual work, not templates.

Free to startNo credit cardCancel anytime
Top matches Balanced preset
Dr. Sarah Chen
Stanford · Interpretability · NLP
91
Dr. Marcus Holloway
MIT · Robotics · RL
84
Dr. Aisha Okonkwo
CMU · Fairness · HCI
82
Nova · Professor Researcher · re-ranking top 20…
Yizhou Sun

Yizhou Sun

· ProfessorVerified

University of California, Los Angeles · Computer Science

Active 2001–2026

h-index54
Citations16.3k
Papers415224 last 5y
Funding$980k
See your match with Yizhou Sun — sign in to PhdFit.Sign in

About

Yizhou Sun is a professor in the Department of Computer Science at UCLA Samueli School of Engineering. His research interests include data mining, database systems, information retrieval, machine learning, and network science. He earned his PhD from the University of Illinois at Urbana-Champaign in 2012. Throughout his career, he has received numerous awards, including the Data Mining Test of Time Award in 2024, the SDM / IBM Early Career Data Mining Research Award in 2023, the IEEE Intelligent Systems Top 10 Rising Stars in 2023, the VLDB Test of Time Award in 2022, and the NSF Career Award in 2015. His work has been recognized for its significant contributions to the field of data science and related areas.

Research topics

  • Computer Science
  • Artificial Intelligence
  • Data Mining
  • Machine Learning
  • Theoretical computer science
  • Data science
  • Distributed computing
  • Information Retrieval
  • Computer Security
  • Mathematics
  • Database
  • Software engineering
  • World Wide Web

Selected publications

  • A Survey on Graph Neural Network Acceleration: Algorithms, Systems, and Customized Hardware

    ACM Computing Surveys · 2026-03-27

    articleOpen accessSenior author

    Graph neural networks (GNNs) are emerging for machine learning research on graph-structured data. GNNs achieve state-of-the-art performance on many tasks, but they face scalability challenges when it comes to real-world applications that have numerous data and strict latency requirements. Many studies have been conducted on how to accelerate GNNs in an effort to address these challenges. These acceleration techniques touch on various aspects of the GNN pipeline, from smart training and inference algorithms to efficient systems and customized hardware. As the amount of research on GNN acceleration has grown rapidly, there lacks a systematic treatment to provide a unified view and address the complexity of relevant works. In this survey, we provide a taxonomy of GNN acceleration, review the existing approaches, and suggest future research directions. Our taxonomic treatment of GNN acceleration connects the existing works and sets the stage for further development in this area.

  • From Newborn to Impact: Bias-Aware Citation Prediction

    2026-04-09

    articleOpen access

    As a key to accessing research impact, citation dynamics underpins research evaluation, scholarly recommendation, and the study of knowledge diffusion. Citation prediction is particularly critical for newborn papers, where early assessment must be performed without citation signals and under highly long-tailed distributions. We identify two key research gaps: (i) insufficient modeling of implicit factors of scientific impact, leading to reliance on coarse proxies; and (ii) a lack of bias-aware learning that can deliver stable predictions on lowly cited papers. We address these gaps by proposing a Bias-Aware Citation Prediction Framework, which combines multi-agent feature extraction with robust graph representation learning. First, a multi-agent x graph co-learning module derives fine-grained, interpretable signals, such as reproducibility, collaboration network, and text quality, from metadata and external resources, and fuses them with heterogeneous-network embeddings to provide rich supervision even in the absence of early citation signals. Second, we incorporate a set of robust mechanisms: a two-stage forward process that routes explicit factors through an intermediate exposure estimate, GroupDRO to optimize worst-case group risk across environments, and a regularization head that performs what-if analyses on controllable factors under monotonicity and smoothness constraints. Comprehensive experiments on two real-world datasets demonstrate the effectiveness of our proposed model. Specifically, our model achieves around a 13% reduction in error metrics (MALE and RMSLE) and a notable 5.5% improvement in the ranking metric (NDCG) over the baseline methods.

  • Functional brain growth trajectories across the first decade of life from a single longitudinal cohort

    Research Square · 2026-05-14

    preprintOpen access
  • Using graph neural network and symbolic regression to model disordered systems

    Scientific Reports · 2025-07-01 · 1 citations

    articleOpen access

    The key to modeling disordered systems lies in accurately simulating atomic trajectories, typically achieved through molecular dynamic (MD) simulation. The accuracy of MD simulations depends on the precision of the interatomic potential function, which dictates the calculations of atom movements. Traditionally, deriving interatomic potential function relies on extensive prior physical knowledge and high computational cost. This study introduces a novel approach that integrates machine learning with molecular dynamic methods to provide precise interatomic potential energy calculations for disordered systems.

  • Dynamic Generation of Multi-LLM Agents Communication Topologies with Graph Diffusion Models

    ArXiv.org · 2025-10-09

    preprintOpen access

    The efficiency of multi-agent systems driven by large language models (LLMs) largely hinges on their communication topology. However, designing an optimal topology is a non-trivial challenge, as it requires balancing competing objectives such as task performance, communication cost, and robustness. Existing frameworks often rely on static or hand-crafted topologies, which inherently fail to adapt to diverse task requirements, leading to either excessive token consumption for simple problems or performance bottlenecks for complex ones. To address this challenge, we introduce a novel generative framework called \textit{Guided Topology Diffusion (GTD)}. Inspired by conditional discrete graph diffusion models, GTD formulates topology synthesis as an iterative construction process. At each step, the generation is steered by a lightweight proxy model that predicts multi-objective rewards (e.g., accuracy, utility, cost), enabling real-time, gradient-free optimization towards task-adaptive topologies. This iterative, guided synthesis process distinguishes GTD from single-step generative frameworks, enabling it to better navigate complex design trade-offs. We validated GTD across multiple benchmarks, and experiments show that this framework can generate highly task-adaptive, sparse, and efficient communication topologies, significantly outperforming existing methods in LLM agent collaboration.

  • Open-Set Living Need Prediction with Large Language Models

    ArXiv.org · 2025-06-03

    preprintOpen access

    Living needs are the needs people generate in their daily lives for survival and well-being. On life service platforms like Meituan, user purchases are driven by living needs, making accurate living need predictions crucial for personalized service recommendations. Traditional approaches treat this prediction as a closed-set classification problem, severely limiting their ability to capture the diversity and complexity of living needs. In this work, we redefine living need prediction as an open-set classification problem and propose PIGEON, a novel system leveraging large language models (LLMs) for unrestricted need prediction. PIGEON first employs a behavior-aware record retriever to help LLMs understand user preferences, then incorporates Maslow's hierarchy of needs to align predictions with human living needs. For evaluation and application, we design a recall module based on a fine-tuned text embedding model that links flexible need descriptions to appropriate life services. Extensive experiments on real-world datasets demonstrate that PIGEON significantly outperforms closed-set approaches on need-based life service recall by an average of 19.37%. Human evaluation validates the reasonableness and specificity of our predictions. Additionally, we employ instruction tuning to enable smaller LLMs to achieve competitive performance, supporting practical deployment.

  • LIFT: LLM-Based Pragma Insertion for HLS via GNN Supervised Fine-Tuning

    ArXiv.org · 2025-04-29

    preprintOpen access

    FPGAs are increasingly adopted in datacenter environments for their reconfigurability and energy efficiency. High-Level Synthesis (HLS) tools have eased FPGA programming by raising the abstraction level from RTL to untimed C/C++, yet attaining high performance still demands expert knowledge and iterative manual insertion of optimization pragmas to modify the microarchitecture. To address this challenge, we propose LIFT, a large language model (LLM)-based coding assistant for HLS that automatically generates performance-critical pragmas given a C/C++ design. We fine-tune the LLM by tightly integrating and supervising the training process with a graph neural network (GNN), combining the sequential modeling capabilities of LLMs with the structural and semantic understanding of GNNs necessary for reasoning over code and its control/data dependencies. On average, LIFT produces designs that improve performance by 3.52x and 2.16x than prior state-of the art AutoDSE and HARP respectively, and 66x than GPT-4o.

  • Genetic insights into colorectal cancer pathogenesis: a multi-omics and immunity perspective

    Translational Cancer Research · 2025-11-01 · 1 citations

    articleOpen accessSenior authorCorresponding

    Background: Colorectal cancer (CRC) is a global health issue influenced by both genetic and environmental factors. Identifying key genes closely associated with CRC is crucial for understanding its pathological mechanisms and discovering therapeutic targets. This study aimed to integrate multi-omics datasets and Mendelian randomization (MR) approaches to identify CRC-related genes and to clarify their roles in tumor immunity and therapeutic potential. Methods: This is a cross-sectional study. We utilized databases such as Gene Expression Omnibus (GEO), Finnish National Genome Project (FinnGen), expression Quantitative Trait Loci (eQTL), and The Cancer Genome Atlas (TCGA), and employed MR, summary-data-based MR (SMR), and gene expression analyses to screen genes related to CRC development. Through immune cell infiltration analysis and mediation MR, we explored the relationship between these genes, tumor immunity, and immunotherapy. Results: were found to be protective factors. Additionally, the differential expression of these genes in CRC tissues was validated through immunohistochemistry (IHC) and reverse transcription quantitative polymerase chain reaction (RT-qPCR) experiments. Mediation MR analysis demonstrated that immune cell phenotypes could mediate the effect of these genes on CRC. These genes were also found to be associated with tumor mutational burden (TMB) and microsatellite instability (MSI) scores. Conclusions: Our findings reveal how these genes contribute to CRC pathogenesis by modulating the immune microenvironment, providing important biomarkers and targets for the development of novel therapeutic strategies.

  • FUSE: Measure-Theoretic Compact Fuzzy Set Representation for Taxonomy Expansion

    ArXiv.org · 2025-06-10

    preprintOpen accessSenior author

    Taxonomy Expansion, which models complex concepts and their relations, can be formulated as a set representation learning task. The generalization of set, fuzzy set, incorporates uncertainty and measures the information within a semantic concept, making it suitable for concept modeling. Existing works usually model sets as vectors or geometric objects such as boxes, which are not closed under set operations. In this work, we propose a sound and efficient formulation of set representation learning based on its volume approximation as a fuzzy set. The resulting embedding framework, Fuzzy Set Embedding (FUSE), satisfies all set operations and compactly approximates the underlying fuzzy set, hence preserving information while being efficient to learn, relying on minimum neural architecture. We empirically demonstrate the power of FUSE on the task of taxonomy expansion, where FUSE achieves remarkable improvements up to 23% compared with existing baselines. Our work marks the first attempt to understand and efficiently compute the embeddings of fuzzy sets.

  • Iceberg: Enhancing HLS Modeling with Synthetic Data

    2025-06-26 · 1 citations

    article

    Deep learning-based prediction models for High-Level Synthesis (HLS) of hardware designs often struggle to generalize. In this paper, we study how to close the generalizability gap of these models through pretraining on synthetic data and introduce Iceberg, a synthetic data augmentation approach that expands both large language model (LLM)-generated programs and weak labels of unseen design configurations. Our weak label generation method is integrated with an in-context model architecture, enabling meta-learning from actual and proximate labels. Iceberg improves the geometric mean modeling accuracy by 86.4% when adapt to six real-world applications with few-shot examples and achieves a 2.47× and a 1.12× better offline DSE performance when adapting to two different test datasets. Our open-sourced code is here: https://github.com/UCLA-VAST/iceberg.

Recent grants

Frequent coauthors

Awards & honors

  • Data Mining Test of Time Award, 2024
  • SDM / IBM Early Career Data Mining Research Award, 2023
  • IEEE Intelligent Systems Top 10 Rising Stars, 2023
  • VLDB Test of Time Award, 2022
  • Amazon Research Award, 2020
  • Resume-aware match score
  • Save to shortlist
  • AI-drafted outreach

See your match with Yizhou Sun

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

  • Free to start
  • No credit card
  • 30-second signup