
About
We are pioneering the future of precision medicine through cutting-edge multiscale optical microscopy, automation and robotics, artificial intelligence, and large-scale bioimage informatics. Our imaging techniques span seven orders of magnitude—from the nanoscale to the mesoscale—enabling transformative advancements in precision medicine. Our lab's fusion of cross-scale imaging and AI-driven systems biology is setting the stage for unprecedented scientific discoveries and transformative personalized medicine.
Research topics
- Computer Science
- Artificial Intelligence
- Machine Learning
- Computer Security
- Software engineering
- Geography
- Programming language
- Data science
- Data Mining
- World Wide Web
- Operating system
- Computer network
- Database
- Engineering
Selected publications
ReCDroid+: Automated End-to-End Crash Reproduction from Bug Reports for Android Apps
ACM Transactions on Software Engineering and Methodology · 2022 · 33 citations
- Computer Science
- Computer Science
- Computer Security
The large demand of mobile devices creates significant concerns about the quality of mobile applications (apps). Developers heavily rely on bug reports in issue tracking systems to reproduce failures (e.g., crashes). However, the process of crash reproduction is often manually done by developers, making the resolution of bugs inefficient, especially given that bug reports are often written in natural language. To improve the productivity of developers in resolving bug reports, in this paper, we introduce a novel approach, called ReCDroid+, that can automatically reproduce crashes from bug reports for Android apps. ReCDroid+ uses a combination of natural language processing (NLP) , deep learning, and dynamic GUI exploration to synthesize event sequences with the goal of reproducing the reported crash. We have evaluated ReCDroid+ on 66 original bug reports from 37 Android apps. The results show that ReCDroid+ successfully reproduced 42 crashes (63.6% success rate) directly from the textual description of the manually reproduced bug reports. A user study involving 12 participants demonstrates that ReCDroid+ can improve the productivity of developers when resolving crash bug reports.
Pre-trained models: Past, present and future
AI Open · 2021 · 924 citations
- Computer Science
- Artificial Intelligence
- Computer Science
Large-scale pre-trained models (PTMs) such as BERT and GPT have recently achieved great success and become a milestone in the field of artificial intelligence (AI). Owing to sophisticated pre-training objectives and huge model parameters, large-scale PTMs can effectively capture knowledge from massive labeled and unlabeled data. By storing knowledge into huge parameters and fine-tuning on specific tasks, the rich knowledge implicitly encoded in huge parameters can benefit a variety of downstream tasks, which has been extensively demonstrated via experimental verification and empirical analysis. It is now the consensus of the AI community to adopt PTMs as backbone for downstream tasks rather than learning models from scratch. In this paper, we take a deep look into the history of pre-training, especially its special relation with transfer learning and self-supervised learning, to reveal the crucial position of PTMs in the AI development spectrum. Further, we comprehensively review the latest breakthroughs of PTMs. These breakthroughs are driven by the surge of computational power and the increasing availability of data, towards four important directions: designing effective architectures, utilizing rich contexts, improving computational efficiency, and conducting interpretation and theoretical analysis. Finally, we discuss a series of open problems and research directions of PTMs, and hope our view can inspire and advance the future study of PTMs.
Learning to Detect Malicious Clients for Robust Federated Learning
arXiv (Cornell University) · 2020 · 186 citations
- Computer Science
- Computer Science
- Computer Security
Federated learning systems are vulnerable to attacks from malicious clients. As the central server in the system cannot govern the behaviors of the clients, a rogue client may initiate an attack by sending malicious model updates to the server, so as to degrade the learning performance or enforce targeted model poisoning attacks (a.k.a. backdoor attacks). Therefore, timely detecting these malicious model updates and the underlying attackers becomes critically important. In this work, we propose a new framework for robust federated learning where the central server learns to detect and remove the malicious model updates using a powerful detection model, leading to targeted defense. We evaluate our solution in both image classification and sentiment analysis tasks with a variety of machine learning models. Experimental results show that our solution ensures robust federated learning that is resilient to both the Byzantine attacks and the targeted model poisoning attacks.
Boba: Authoring and Visualizing Multiverse Analyses
IEEE Transactions on Visualization and Computer Graphics · 2020 · 79 citations
1st authorCorresponding- Computer Science
- Computer Science
- Data science
Multiverse analysis is an approach to data analysis in which all "reasonable" analytic decisions are evaluated in parallel and interpreted collectively, in order to foster robustness and transparency. However, specifying a multiverse is demanding because analysts must manage myriad variants from a cross-product of analytic decisions, and the results require nuanced interpretation. We contribute Baba: an integrated domain-specific language (DSL) and visual analysis system for authoring and reviewing multiverse analyses. With the Boba DSL, analysts write the shared portion of analysis code only once, alongside local variations defining alternative decisions, from which the compiler generates a multiplex of scripts representing all possible analysis paths. The Boba Visualizer provides linked views of model results and the multiverse decision space to enable rapid, systematic assessment of consequential decisions and robustness, including sampling uncertainty and model fit. We demonstrate Boba's utility through two data analysis case studies, and reflect on challenges and design opportunities for multiverse analysis software.
BatchCrypt: Efficient homomorphic encryption for cross-silo federated learning
2020 · 275 citations
Senior authorCorresponding- Computer Science
- Computer Science
- Computer Security
Cross-silo federated learning (FL) enables organizations (e.g., financial or medical) to collaboratively train a machine learning model by aggregating local gradient updates from each client without sharing privacy-sensitive data. To ensure no update is revealed during aggregation, industrial FL frameworks allow clients to mask local gradient updates using additively homomorphic encryption (HE). However, this results in significant cost in computation and communication. In our characterization, HE operations dominate the training time, while inflating the data transfer amount by two orders of magnitude. In this paper, we present BatchCrypt, a system solution for cross-silo FL that substantially reduces the encryption and communication overhead caused by HE. Instead of encrypting individual gradients with full precision, we encode a batch of quantized gradients into a long integer and encrypt it in one go. To allow gradient-wise aggregation to be performed on ciphertexts of the encoded batches, we develop new quantization and encoding schemes along with a novel gradient clipping technique. We implemented BatchCrypt as a plug-in module in FATE, an industrial cross-silo FL framework. Evaluations with EC2 clients in geo-distributed datacenters show that BatchCrypt achieves 23×-93× training speedup while reducing the communication overhead by 66×-101×. The accuracy loss due to quantization errors is less than 1%. Copyright © Proc. of the 2020 USENIX Annual Technical Conference, ATC 2020. All rights reserved.
Machine Learning Testing: Survey, Landscapes and Horizons
IEEE Transactions on Software Engineering · 2020 · 813 citations
Senior authorCorresponding- Computer Science
- Machine Learning
- Computer Science
This paper provides a comprehensive survey of techniques for testing machine learning systems; Machine Learning Testing (ML testing) research. It covers 144 papers on testing properties (e.g., correctness, robustness, and fairness), testing components (e.g., the data, learning program, and framework), testing workflow (e.g., test generation and test evaluation), and application scenarios (e.g., autonomous driving, machine translation). The paper also analyses trends concerning datasets, research trends, and research focus, concluding with research challenges and promising research directions in ML testing.
FedML: A Research Library and Benchmark for Federated Machine Learning
arXiv (Cornell University) · 2020 · 358 citations
- Computer Science
- Computer Science
- Artificial Intelligence
Federated learning (FL) is a rapidly growing research field in machine learning. However, existing FL libraries cannot adequately support diverse algorithmic development; inconsistent dataset and model usage make fair algorithm comparison challenging. In this work, we introduce FedML, an open research library and benchmark to facilitate FL algorithm development and fair performance comparison. FedML supports three computing paradigms: on-device training for edge devices, distributed computing, and single-machine simulation. FedML also promotes diverse algorithmic research with flexible and generic API design and comprehensive reference baseline implementations (optimizer, models, and datasets). We hope FedML could provide an efficient and reproducible means for developing and evaluating FL algorithms that would benefit the FL research community. We maintain the source code, documents, and user community at https://fedml.ai.
Advanced Engineering Informatics · 2020 · 58 citations
- Computer Science
- Computer Science
- Artificial Intelligence
Recent grants
Clarity and Efficiency in Design
NSF · $204k · 2006–2011
EAGER: From Clarity to Efficiency for Distributed Algorithms
NSF · $200k · 2012–2016
CI-P: Collaborative Research: Summarizing Opinion and Speaker Attitude in Speech
NSF · $37k · 2011–2014
FAI: Fairness in Machine Learning with Human in the Loop
NSF · $625k · 2021–2026
Collaborative Research: RI: Small: Wisdom of Crowds with Machines in the Loop
NSF · $233k · 2020–2025
Frequent coauthors
- 165 shared
Xiaofei Xie
Singapore Management University
- 149 shared
Jun Sun
Singapore Management University
- 121 shared
Jin Song Dong
- 106 shared
Maosong Sun
- 103 shared
Lei Ma
University of Alberta
- 81 shared
Felix Juefei-Xu
New York University
- 77 shared
Qing Guo
Agency for Science, Technology and Research
- 72 shared
Dilek Hakkani‐Tür
Education
- 2020
Ph.D., School of Computing
National University of Singapore
- 2005
Bachelor
National University of Singapore
Similar researchers at University of Illinois Urbana-Champaign
- Resume-aware match score
- Save to shortlist
- AI-drafted outreach
See your match with Yang Liu
PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.
- Free to start
- No credit card
- 30-second signup