Yang Liu

· Professor

University of Illinois Urbana-Champaign · Bioengineering

Active 1981–2024

h-index104

Citations58.3k

Papers3.5k1864 last 5y

Funding$1.7M1 active

Faculty page Lab page Website

See your match with Yang Liu — sign in to PhdFit.Sign in

About

We are pioneering the future of precision medicine through cutting-edge multiscale optical microscopy, automation and robotics, artificial intelligence, and large-scale bioimage informatics. Our imaging techniques span seven orders of magnitude—from the nanoscale to the mesoscale—enabling transformative advancements in precision medicine. Our lab's fusion of cross-scale imaging and AI-driven systems biology is setting the stage for unprecedented scientific discoveries and transformative personalized medicine.

Research topics

Computer Science
Artificial Intelligence
Machine Learning
Computer Security
Software engineering
Geography
Programming language
Data science
Data Mining
World Wide Web
Operating system
Computer network
Database
Engineering

Selected publications

ReCDroid+: Automated End-to-End Crash Reproduction from Bug Reports for Android Apps
ACM Transactions on Software Engineering and Methodology · 2022 · 33 citations
- Computer Science
- Computer Science
- Computer Security
The large demand of mobile devices creates significant concerns about the quality of mobile applications (apps). Developers heavily rely on bug reports in issue tracking systems to reproduce failures (e.g., crashes). However, the process of crash reproduction is often manually done by developers, making the resolution of bugs inefficient, especially given that bug reports are often written in natural language. To improve the productivity of developers in resolving bug reports, in this paper, we introduce a novel approach, called ReCDroid+, that can automatically reproduce crashes from bug reports for Android apps. ReCDroid+ uses a combination of natural language processing (NLP) , deep learning, and dynamic GUI exploration to synthesize event sequences with the goal of reproducing the reported crash. We have evaluated ReCDroid+ on 66 original bug reports from 37 Android apps. The results show that ReCDroid+ successfully reproduced 42 crashes (63.6% success rate) directly from the textual description of the manually reproduced bug reports. A user study involving 12 participants demonstrates that ReCDroid+ can improve the productivity of developers when resolving crash bug reports.
Publisher DOI
Pre-trained models: Past, present and future
AI Open · 2021 · 924 citations
- Computer Science
- Artificial Intelligence
- Computer Science
Large-scale pre-trained models (PTMs) such as BERT and GPT have recently achieved great success and become a milestone in the field of artificial intelligence (AI). Owing to sophisticated pre-training objectives and huge model parameters, large-scale PTMs can effectively capture knowledge from massive labeled and unlabeled data. By storing knowledge into huge parameters and fine-tuning on specific tasks, the rich knowledge implicitly encoded in huge parameters can benefit a variety of downstream tasks, which has been extensively demonstrated via experimental verification and empirical analysis. It is now the consensus of the AI community to adopt PTMs as backbone for downstream tasks rather than learning models from scratch. In this paper, we take a deep look into the history of pre-training, especially its special relation with transfer learning and self-supervised learning, to reveal the crucial position of PTMs in the AI development spectrum. Further, we comprehensively review the latest breakthroughs of PTMs. These breakthroughs are driven by the surge of computational power and the increasing availability of data, towards four important directions: designing effective architectures, utilizing rich contexts, improving computational efficiency, and conducting interpretation and theoretical analysis. Finally, we discuss a series of open problems and research directions of PTMs, and hope our view can inspire and advance the future study of PTMs.
DOI
Learning to Detect Malicious Clients for Robust Federated Learning
arXiv (Cornell University) · 2020 · 186 citations
- Computer Science
- Computer Science
- Computer Security
Federated learning systems are vulnerable to attacks from malicious clients. As the central server in the system cannot govern the behaviors of the clients, a rogue client may initiate an attack by sending malicious model updates to the server, so as to degrade the learning performance or enforce targeted model poisoning attacks (a.k.a. backdoor attacks). Therefore, timely detecting these malicious model updates and the underlying attackers becomes critically important. In this work, we propose a new framework for robust federated learning where the central server learns to detect and remove the malicious model updates using a powerful detection model, leading to targeted defense. We evaluate our solution in both image classification and sentiment analysis tasks with a variety of machine learning models. Experimental results show that our solution ensures robust federated learning that is resilient to both the Byzantine attacks and the targeted model poisoning attacks.
DOI
Boba: Authoring and Visualizing Multiverse Analyses
IEEE Transactions on Visualization and Computer Graphics · 2020 · 79 citations
1st authorCorresponding
- Computer Science
- Computer Science
- Data science
Multiverse analysis is an approach to data analysis in which all "reasonable" analytic decisions are evaluated in parallel and interpreted collectively, in order to foster robustness and transparency. However, specifying a multiverse is demanding because analysts must manage myriad variants from a cross-product of analytic decisions, and the results require nuanced interpretation. We contribute Baba: an integrated domain-specific language (DSL) and visual analysis system for authoring and reviewing multiverse analyses. With the Boba DSL, analysts write the shared portion of analysis code only once, alongside local variations defining alternative decisions, from which the compiler generates a multiplex of scripts representing all possible analysis paths. The Boba Visualizer provides linked views of model results and the multiverse decision space to enable rapid, systematic assessment of consequential decisions and robustness, including sampling uncertainty and model fit. We demonstrate Boba's utility through two data analysis case studies, and reflect on challenges and design opportunities for multiverse analysis software.
Publisher OA PDF DOI
BatchCrypt: Efficient homomorphic encryption for cross-silo federated learning
2020 · 275 citations
Senior authorCorresponding
- Computer Science
- Computer Science
- Computer Security
Cross-silo federated learning (FL) enables organizations (e.g., financial or medical) to collaboratively train a machine learning model by aggregating local gradient updates from each client without sharing privacy-sensitive data. To ensure no update is revealed during aggregation, industrial FL frameworks allow clients to mask local gradient updates using additively homomorphic encryption (HE). However, this results in significant cost in computation and communication. In our characterization, HE operations dominate the training time, while inflating the data transfer amount by two orders of magnitude. In this paper, we present BatchCrypt, a system solution for cross-silo FL that substantially reduces the encryption and communication overhead caused by HE. Instead of encrypting individual gradients with full precision, we encode a batch of quantized gradients into a long integer and encrypt it in one go. To allow gradient-wise aggregation to be performed on ciphertexts of the encoded batches, we develop new quantization and encoding schemes along with a novel gradient clipping technique. We implemented BatchCrypt as a plug-in module in FATE, an industrial cross-silo FL framework. Evaluations with EC2 clients in geo-distributed datacenters show that BatchCrypt achieves 23×-93× training speedup while reducing the communication overhead by 66×-101×. The accuracy loss due to quantization errors is less than 1%. Copyright © Proc. of the 2020 USENIX Annual Technical Conference, ATC 2020. All rights reserved.
Machine Learning Testing: Survey, Landscapes and Horizons
IEEE Transactions on Software Engineering · 2020 · 813 citations
Senior authorCorresponding
- Computer Science
- Machine Learning
- Computer Science
This paper provides a comprehensive survey of techniques for testing machine learning systems; Machine Learning Testing (ML testing) research. It covers 144 papers on testing properties (e.g., correctness, robustness, and fairness), testing components (e.g., the data, learning program, and framework), testing workflow (e.g., test generation and test evaluation), and application scenarios (e.g., autonomous driving, machine translation). The paper also analyses trends concerning datasets, research trends, and research focus, concluding with research challenges and promising research directions in ML testing.
DOI
FedML: A Research Library and Benchmark for Federated Machine Learning
arXiv (Cornell University) · 2020 · 358 citations
- Computer Science
- Computer Science
- Artificial Intelligence
Federated learning (FL) is a rapidly growing research field in machine learning. However, existing FL libraries cannot adequately support diverse algorithmic development; inconsistent dataset and model usage make fair algorithm comparison challenging. In this work, we introduce FedML, an open research library and benchmark to facilitate FL algorithm development and fair performance comparison. FedML supports three computing paradigms: on-device training for edge devices, distributed computing, and single-machine simulation. FedML also promotes diverse algorithmic research with flexible and generic API design and comprehensive reference baseline implementations (optimizer, models, and datasets). We hope FedML could provide an efficient and reproducible means for developing and evaluating FL algorithms that would benefit the FL research community. We maintain the source code, documents, and user community at https://fedml.ai.
DOI
A context-augmented deep learning approach for worker trajectory prediction on unstructured and dynamic construction sites
Advanced Engineering Informatics · 2020 · 58 citations
- Computer Science
- Computer Science
- Artificial Intelligence
DOI

Recent grants

Clarity and Efficiency in Design
NSF · $204k · 2006–2011
EAGER: From Clarity to Efficiency for Distributed Algorithms
NSF · $200k · 2012–2016
CI-P: Collaborative Research: Summarizing Opinion and Speaker Attitude in Speech
NSF · $37k · 2011–2014
FAI: Fairness in Machine Learning with Human in the Loop
NSF · $625k · 2021–2026
Collaborative Research: RI: Small: Wisdom of Crowds with Machines in the Loop
NSF · $233k · 2020–2025

Frequent coauthors

Xiaofei Xie
Singapore Management University
165 shared
Jun Sun
Singapore Management University
149 shared
Jin Song Dong
121 shared
Maosong Sun
106 shared
Lei Ma
University of Alberta
103 shared
Felix Juefei-Xu
New York University
81 shared
Qing Guo
Agency for Science, Technology and Research
77 shared
Dilek Hakkani‐Tür
72 shared

Education

Ph.D., School of Computing
National University of Singapore
2020
Bachelor
National University of Singapore
2005

Similar researchers at University of Illinois Urbana-Champaign

Resume-aware match score
Save to shortlist
AI-drafted outreach

See your match with Yang Liu

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

Join the waitlist How it works

Free to start
No credit card
30-second signup

Find professors who actually fit you