
Wei-keng Liao
· Research ProfessorVerifiedNorthwestern University · Chemical Engineering
Active 1996–2026
About
Wei-keng Liao is a Research Professor at Northwestern Engineering, affiliated with the Center for Ultra-scale Computing and Information Security within the Department of Electrical and Computer Engineering. His research focuses on parallel and distributed file I/O and storage system design, data management for large-scale scientific applications, and data mining algorithm design and their parallelization. His work contributes to advancing high-performance computing systems and data processing techniques, supporting large-scale scientific and engineering applications.
Research topics
- Computer Science
- Artificial Intelligence
- Data Mining
- Machine Learning
- Algorithm
- Data science
- Physics
- Statistical physics
- Materials science
- Geometry
- Composite material
- Mathematics
- Nanotechnology
Selected publications
Evaluating large language models for inverse semiconductor design
Digital Discovery · 2026-01-01
articleOpen accessLarge Language Models (LLMs) can enable inverse materials discovery by generating text-encoded crystal structures from target properties.
E3SM-Project/scorpio: SCORPIO version 1.9.3
Zenodo (CERN European Organization for Nuclear Research) · 2026-04-23
otherOpen accessSenior authorThis patch release includes, Fix for issues with HDF5 iotypes for EAMXX history output
Determinism analysis in hybrid-LLM-GNN modeling for materials property prediction
Machine Learning Science and Technology · 2026-05-06
articleOpen accessAbstract Driven by advances in artificial intelligence and the growing availability of databases, machine learning now plays a central role in data-driven materials knowledge discovery. In studies that employ machine learning models, maintaining deterministic workflows is crucial for reproducibility, as it ensures that repeated runs with identical settings yield consistent and reliable predictions. Here, we conduct a determinism analysis of the recently proposed hybrid LLM-GNN framework for material property prediction. Our study provides a comprehensive analysis of sources of non-determinism, with particular focus on variability introduced by hardware variations and software stacks across training and inference pipelines. By distinguishing different forms of determinism, we assess how discrepancies arise under varying execution conditions. In cases where results are non-deterministic, we further analyze and compare prediction differences to assess the impact of these variations. Our investigation uncovers systematic patterns in prediction discrepancies at the dataset level as well as variations at the individual sample level. Based on the findings, we identify the main sources of divergence and provide practical recommendations to improve deterministic behavior in ML-based materials property prediction workflows.
E3SM-Project/scorpio: SCORPIO version 1.9.1
Zenodo (CERN European Organization for Nuclear Research) · 2026-01-14
otherOpen accessSenior authorSCORPIO v1.9.1 includes many fixes related to data compression, Fixed issues with multiple redef/enddef calls with HDF5 Manual datatype conversion, as needed, for HDF5 output Fix typos for ADIOS+ZFP compression rate Disable HDF5+ZFP compression for scalars and 5D+ variables and variables with only unlimited dimensions Adding related tests put_vars_* support for CDF5 types when using PnetCDF
E3SM-Project/scorpio: SCORPIO v1.9.2
Open MIND · 2026-04-08
otherOpen accessSenior authorSCORPIO v1.9.2 includes the following fixes, Fix for build issues with some compilers due to missing uint64 type definition
Materials Research Express · 2026-03-12
articleOpen accessAbstract Graph neural networks (GNNs) have proven effective in understanding and predicting diverse material properties, even when working with limited datasets. An important step in training GNN is to use an appropriate and informative graph embedding that can adequately represent the structural and compositional information in the chemical space. Current graph embeddings consist of composition and structure-agnostic element-level encodings, which are static in nature. This makes it challenging to differentiate between different compounds on the element level, especially for datasets with limited data size, thereby relying more on the complex input and architecture for model training. Here, we present a novel framework for GNN-based prediction tasks that use dynamic embedding to significantly improve the models’ predictive ability on materials properties with limited data size. We evaluated the proposed framework on multiple materials datasets across various domains to find that the model trained using dynamic embedding outperforms the models trained using conventional static embedding and features obtained using a pre-trained model. The proposed framework holds significant potential for expediting artificial intelligence (AI)-driven materials discovery.
Constraint on Neutrino Statistics from Cosmological Data
ArXiv.org · 2025-01-21
preprintOpen accessSenior authorWe investigate the impact of neutrino statistical property on cosmology and the constraints imposed by cosmological data on neutrino statistics. Cosmological data from probes such as Cosmic Microwave Background(CMB) radiation and Baryon Acoustic Oscillation(BAO) are used to constrain the statistical parameter of neutrino. This constraint is closely related to the degeneracy effects among neutrino statistical property, the sum of neutrino masses, and the Hubble constant. Our results show that purely bosonic neutrinos can be ruled out at 95\% confidence level and purely fermionic neutrinos are preferred.
An AI framework for time series microstructure prediction from processing parameters
Scientific Reports · 2025-07-05 · 6 citations
articleOpen accessIn this study, we present an artificial intelligence (AI)-driven framework for predicting the microstructural texture of polycrystalline materials after a specific deformation process. The microstructural texture is defined in terms of the orientation distribution function (ODF) which indicates the volume density of crystal orientations. Our approach leverages an encoder-decoder model with Long Short-Term Memory (LSTM) layers to model the relationship between processing conditions and material properties. As a case study, we apply our framework to copper, generating a dataset of 3125 unique processing parameter combinations and their corresponding ODF vectors. The resulting predictions enable the calculation of homogenized properties. Our AI-driven framework outperforms traditional material processing simulations, yielding faster results with limited error rates (< 0.3% for both the elastic matrix C and the compliance matrix S), making it a promising tool for the expedited design of microstructures with tailored properties.
Parallel Data Object Creation: Towards Scalable Metadata Management in High-Performance I/O Library
ArXiv.org · 2025-06-18
preprintOpen accessSenior authorHigh-level I/O libraries, such as HDF5 and PnetCDF, are commonly used by large-scale scientific applications to perform I/O tasks in parallel. These I/O libraries store the metadata such as data types and dimensionality along with the raw data in the same files. While these libraries are well-optimized for concurrent access to the raw data, they are designed neither to handle a large number of data objects efficiently nor to create different data objects independently by multiple processes, as they require applications to call data object creation APIs collectively with consistent metadata among all processes. Applications that process data gathered from remote sensors, such as particle collision experiments in high-energy physics, may generate data of different sizes from different sensors and desire to store them as separate data objects. For such applications, the I/O library's requirement on collective data object creation can become very expensive, as the cost of metadata consistency check increases with the metadata volume as well as the number of processes. To address this limitation, using PnetCDF as an experimental platform, we investigate solutions in this paper that abide the netCDF file format, as well as propose a new file header format that enables independent data object creation. The proposed file header consists of two sections, an index table and a list of metadata blocks. The index table contains the reference to the metadata blocks and each block stores metadata of objects that can be created collectively or independently. The new design achieves a scalable performance, cutting data object creation times by up to 582x when running on 4096 MPI processes to create 5,684,800 data objects in parallel. Additionally, the new method reduces the memory footprints, with each process requiring an amount of memory space inversely proportional to the number of processes.
Parallel Data Object Creation: Scalable Metadata Management in Parallel I/O Library
2025-11-07 · 1 citations
articleOpen accessSenior authorHigh-level I/O libraries, such as PnetCDF and HDF5, are commonly used by large-scale scientific applications to perform I/O tasks in parallel. These I/O libraries store the metadata of data objects in files along with their raw data. To ensure metadata consistency during parallel data object creation, they require applications to call the metadata APIs collectively using consistent metadata. Such a requirement can result in an expensive consistency check, as its cost increases with the metadata volume and the number of processes. To address this limitation, we propose a new file header format, which uses partitioned metadata blocks to enable independent data object creation and reduce the objects required for consistency check. Our performance evaluation shows that this new design achieves a scalable performance, cutting data object creation times by up to 196 × when running on 4096 MPI processes to create 5,684,800 data objects in parallel.
Recent grants
STTR Phase II: A Design-Driven Educational Robotics Framework
NSF · $1.1M · 2021–2025
Frequent coauthors
- 181 shared
Alok Choudhary
Northwestern University
- 117 shared
Ankit Agrawal
Northwestern University
- 34 shared
Alok Choudhary
Northwestern University
- 24 shared
Vishu Gupta
- 22 shared
Robert Ross
Argonne National Laboratory
- 19 shared
Dipendra Jha
- 18 shared
Sunwoo Lee
Inha University
- 16 shared
Reda Al-Bahrani
- Resume-aware match score
- Save to shortlist
- AI-drafted outreach
See your match with Wei-keng Liao
PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.
- Free to start
- No credit card
- 30-second signup