Wei-keng Liao

· Research ProfessorVerified

Northwestern University · Chemical Engineering

Active 1996–2026

h-index38

Citations6.9k

Papers27790 last 5y

Funding$1.1M

Faculty page

See your match with Wei-keng Liao — sign in to PhdFit.Sign in

About

Wei-keng Liao is a Research Professor at Northwestern Engineering, affiliated with the Center for Ultra-scale Computing and Information Security within the Department of Electrical and Computer Engineering. His research focuses on parallel and distributed file I/O and storage system design, data management for large-scale scientific applications, and data mining algorithm design and their parallelization. His work contributes to advancing high-performance computing systems and data processing techniques, supporting large-scale scientific and engineering applications.

Research topics

Computer Science
Artificial Intelligence
Data Mining
Machine Learning
Algorithm
Data science
Physics
Statistical physics
Materials science
Geometry
Composite material
Mathematics
Nanotechnology

Selected publications

Evaluating large language models for inverse semiconductor design
Digital Discovery · 2026-01-01
articleOpen access
Large Language Models (LLMs) can enable inverse materials discovery by generating text-encoded crystal structures from target properties.
Publisher OA PDF DOI
E3SM-Project/scorpio: SCORPIO version 1.9.3
Zenodo (CERN European Organization for Nuclear Research) · 2026-04-23
otherOpen accessSenior author
This patch release includes, Fix for issues with HDF5 iotypes for EAMXX history output
Publisher DOI
Determinism analysis in hybrid-LLM-GNN modeling for materials property prediction
Machine Learning Science and Technology · 2026-05-06
articleOpen access
Abstract Driven by advances in artificial intelligence and the growing availability of databases, machine learning now plays a central role in data-driven materials knowledge discovery. In studies that employ machine learning models, maintaining deterministic workflows is crucial for reproducibility, as it ensures that repeated runs with identical settings yield consistent and reliable predictions. Here, we conduct a determinism analysis of the recently proposed hybrid LLM-GNN framework for material property prediction. Our study provides a comprehensive analysis of sources of non-determinism, with particular focus on variability introduced by hardware variations and software stacks across training and inference pipelines. By distinguishing different forms of determinism, we assess how discrepancies arise under varying execution conditions. In cases where results are non-deterministic, we further analyze and compare prediction differences to assess the impact of these variations. Our investigation uncovers systematic patterns in prediction discrepancies at the dataset level as well as variations at the individual sample level. Based on the findings, we identify the main sources of divergence and provide practical recommendations to improve deterministic behavior in ML-based materials property prediction workflows.
Publisher DOI
E3SM-Project/scorpio: SCORPIO version 1.9.1
Zenodo (CERN European Organization for Nuclear Research) · 2026-01-14
otherOpen accessSenior author
SCORPIO v1.9.1 includes many fixes related to data compression, Fixed issues with multiple redef/enddef calls with HDF5 Manual datatype conversion, as needed, for HDF5 output Fix typos for ADIOS+ZFP compression rate Disable HDF5+ZFP compression for scalars and 5D+ variables and variables with only unlimited dimensions Adding related tests put_vars_* support for CDF5 types when using PnetCDF
Publisher DOI
E3SM-Project/scorpio: SCORPIO v1.9.2
Open MIND · 2026-04-08
otherOpen accessSenior author
SCORPIO v1.9.2 includes the following fixes, Fix for build issues with some compilers due to missing uint64 type definition
Publisher DOI
Dynamic embedding representation for graph neural networks to enhance materials property prediction with limited datasets
Materials Research Express · 2026-03-12
articleOpen access
Abstract Graph neural networks (GNNs) have proven effective in understanding and predicting diverse material properties, even when working with limited datasets. An important step in training GNN is to use an appropriate and informative graph embedding that can adequately represent the structural and compositional information in the chemical space. Current graph embeddings consist of composition and structure-agnostic element-level encodings, which are static in nature. This makes it challenging to differentiate between different compounds on the element level, especially for datasets with limited data size, thereby relying more on the complex input and architecture for model training. Here, we present a novel framework for GNN-based prediction tasks that use dynamic embedding to significantly improve the models’ predictive ability on materials properties with limited data size. We evaluated the proposed framework on multiple materials datasets across various domains to find that the model trained using dynamic embedding outperforms the models trained using conventional static embedding and features obtained using a pre-trained model. The proposed framework holds significant potential for expediting artificial intelligence (AI)-driven materials discovery.
Publisher DOI
Constraint on Neutrino Statistics from Cosmological Data
ArXiv.org · 2025-01-21
preprintOpen accessSenior author
We investigate the impact of neutrino statistical property on cosmology and the constraints imposed by cosmological data on neutrino statistics. Cosmological data from probes such as Cosmic Microwave Background(CMB) radiation and Baryon Acoustic Oscillation(BAO) are used to constrain the statistical parameter of neutrino. This constraint is closely related to the degeneracy effects among neutrino statistical property, the sum of neutrino masses, and the Hubble constant. Our results show that purely bosonic neutrinos can be ruled out at 95\% confidence level and purely fermionic neutrinos are preferred.
Publisher OA PDF DOI
An AI framework for time series microstructure prediction from processing parameters
Scientific Reports · 2025-07-05 · 6 citations
articleOpen access
In this study, we present an artificial intelligence (AI)-driven framework for predicting the microstructural texture of polycrystalline materials after a specific deformation process. The microstructural texture is defined in terms of the orientation distribution function (ODF) which indicates the volume density of crystal orientations. Our approach leverages an encoder-decoder model with Long Short-Term Memory (LSTM) layers to model the relationship between processing conditions and material properties. As a case study, we apply our framework to copper, generating a dataset of 3125 unique processing parameter combinations and their corresponding ODF vectors. The resulting predictions enable the calculation of homogenized properties. Our AI-driven framework outperforms traditional material processing simulations, yielding faster results with limited error rates (< 0.3% for both the elastic matrix C and the compliance matrix S), making it a promising tool for the expedited design of microstructures with tailored properties.
Publisher OA PDF DOI
Parallel Data Object Creation: Towards Scalable Metadata Management in High-Performance I/O Library
ArXiv.org · 2025-06-18
preprintOpen accessSenior author
High-level I/O libraries, such as HDF5 and PnetCDF, are commonly used by large-scale scientific applications to perform I/O tasks in parallel. These I/O libraries store the metadata such as data types and dimensionality along with the raw data in the same files. While these libraries are well-optimized for concurrent access to the raw data, they are designed neither to handle a large number of data objects efficiently nor to create different data objects independently by multiple processes, as they require applications to call data object creation APIs collectively with consistent metadata among all processes. Applications that process data gathered from remote sensors, such as particle collision experiments in high-energy physics, may generate data of different sizes from different sensors and desire to store them as separate data objects. For such applications, the I/O library's requirement on collective data object creation can become very expensive, as the cost of metadata consistency check increases with the metadata volume as well as the number of processes. To address this limitation, using PnetCDF as an experimental platform, we investigate solutions in this paper that abide the netCDF file format, as well as propose a new file header format that enables independent data object creation. The proposed file header consists of two sections, an index table and a list of metadata blocks. The index table contains the reference to the metadata blocks and each block stores metadata of objects that can be created collectively or independently. The new design achieves a scalable performance, cutting data object creation times by up to 582x when running on 4096 MPI processes to create 5,684,800 data objects in parallel. Additionally, the new method reduces the memory footprints, with each process requiring an amount of memory space inversely proportional to the number of processes.
Publisher OA PDF DOI
Parallel Data Object Creation: Scalable Metadata Management in Parallel I/O Library
2025-11-07 · 1 citations
articleOpen accessSenior author
High-level I/O libraries, such as PnetCDF and HDF5, are commonly used by large-scale scientific applications to perform I/O tasks in parallel. These I/O libraries store the metadata of data objects in files along with their raw data. To ensure metadata consistency during parallel data object creation, they require applications to call the metadata APIs collectively using consistent metadata. Such a requirement can result in an expensive consistency check, as its cost increases with the metadata volume and the number of processes. To address this limitation, we propose a new file header format, which uses partitioned metadata blocks to enable independent data object creation and reduce the objects required for consistency check. Our performance evaluation shows that this new design achieves a scalable performance, cutting data object creation times by up to 196 × when running on 4096 MPI processes to create 5,684,800 data objects in parallel.
Publisher DOI

Recent grants

STTR Phase II: A Design-Driven Educational Robotics Framework
NSF · $1.1M · 2021–2025

Frequent coauthors

Alok Choudhary
Northwestern University
181 shared
Ankit Agrawal
Northwestern University
117 shared
Alok Choudhary
Northwestern University
34 shared
Vishu Gupta
24 shared
Robert Ross
Argonne National Laboratory
22 shared
Dipendra Jha
19 shared
Sunwoo Lee
Inha University
18 shared
Reda Al-Bahrani
16 shared

Resume-aware match score
Save to shortlist
AI-drafted outreach

See your match with Wei-keng Liao

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

Join the waitlist How it works

Free to start
No credit card
30-second signup

Find professors who actually fit you