Alok Choudhary

· Harold Washington Professor of Electrical and Computer Engineering and Computer ScienceVerified

Northwestern University · Chemical Engineering

Active 1987–2025

h-index65

Citations21.1k

Papers66660 last 5y

Funding$7.4M

Faculty page

See your match with Alok Choudhary — sign in to PhdFit.Sign in

About

Alok Choudhary is the Harold Washington Professor of Electrical and Computer Engineering and Computer Science at Northwestern University. He is associated with the Center for Ultra-scale Computing and Information Security (CUCIS) and is part of the faculty in the Department of Electrical and Computer Engineering and the Department of Computer Science. His research focuses on artificial intelligence, machine learning, and their applications in materials science and electron microscopy. Choudhary has contributed to advancing deep learning methods, AI-driven materials property prediction, and the development of transfer learning frameworks for predictive analytics on small datasets. His work impacts various scientific fields by integrating AI techniques to enhance understanding and innovation in materials research and computational science.

Research signals

Five dimensions sourced from public faculty / publication signals. Sign in to compare against your own profile and see your match score.

Research topics

Computer Science
Artificial Intelligence
Machine Learning
Data Mining
Algorithm
Data science
Materials science
Composite material
Mathematics
Physics
Statistical physics
Geometry
Nanotechnology

Selected publications

Machine Learning Predicts Mortality in Patients With Systemic Sclerosis-Associated Interstitial Lung Disease From Electronic Health Record Data
American Journal of Respiratory and Critical Care Medicine · 2025-05-01
article
Abstract Rationale: Interstitial lung disease (ILD) affects 40-75% of patients with systemic sclerosis (SSc) and is the leading cause of death in this population. SSc-ILD is a heterogeneous disease with a variable clinical course. Current available therapies preserve lung function; however, benefits appear to be modest and are counterbalanced by toxicity. While biomarkers have been reported for progressive disease, their utility is limited. We hypothesize that machine learning (ML) could improve mortality prediction in patients with SSc and SSc-ILD by leveraging readily available electronic health record (EHR) data. Methods: We used data from participants with SSc recruited to the Northwestern University Scleroderma Registry from 1996-2024. EHR data—clinical, laboratory, and spirometric—were extracted, and features were selected. ILD diagnosis was assigned by adjudication of chest CT reports. Multiple ML algorithms were tested to build models to predict mortality (or lung transplant) using EHR features in the entire SSc cohort and a subgroup of those with SSc-ILD. A 70:10:20 patient-wise split for training:validation:testing was implemented. Optuna was employed for hyperparameter optimization on the validation set, and the best model was selected. Receiver operating characteristic (ROC) analysis was used to evaluate each model on the held-out test set. Feature importance was assessed through ablation analysis. Results: 1,170 participants with SSc were available for analysis, encompassing 9,191 person-years of observation. 193 (16%) participants died during the observation period. 709 (61%) had CT data available, and 454 (64%) of these had SSc-ILD for subgroup analysis. 109 (24%) participants with SSc-ILD died during the observation period. EHR features predicted mortality in SSc patients within one (AUC=0.91), three (AUC=0.89), and five years (AUC=0.82) (Figure 1A). Ablation analysis identified features highly predictive of one-year mortality in SSc that are routinely assessed but rarely utilized, including blood counts and chemistries (Figure 1B). Similarly, ML algorithms used EHR features to predict mortality in a subgroup of patients with SSc-ILD within one (AUC=0.71), three (AUC=0.73), and five years (AUC=0.80) (Figure 1C). Ablation analysis again identified predictive features for one-year mortality in those with SSc-ILD (Figure 1D). Conclusions: ML analysis of readily available EHR data predicts mortality in those with SSc and SSc-ILD with high sensitivity and specificity. Ablation analyses identified features predictive of one-year mortality that are routinely collected but rarely assessed to ascertain risk. These models could assist in clinical decision-making, particularly regarding treatment. Future research will integrate quantitative imaging features and employ deep learning models to further improve model performance.
Publisher DOI
MicroProcSim: A Software for Simulation of Microstructure Evolution
Integrating materials and manufacturing innovation · 2025-06-23 · 2 citations
articleOpen access
Understanding the large deformation behavior of materials under external forces is crucial for reliable engineering applications. The mechanical properties of materials depend on their underlying microstructures, which change over time during deformation. Experimental observation of these processes is time-consuming and influenced by various conditions. Therefore, we developed MicroProcSim, a physics-based simulation tool to replicate the deformation process of cubic microstructures. MicroProcSim can predict the evolution of texture, represented by the orientation distribution function (ODF), over time under various loads and strain rates. This software package can be run on both Windows and Linux operating systems. Unlike conventional crystal plasticity finite element software, MicroProcSim offers a distinct advantage by rapidly generating deformed textures, as it bypasses incorporating grain morphology. Furthermore, comparisons with existing experimental and computational studies on texture evolution have demonstrated that this software seamlessly replicates real-world material processing conditions through a simple modification of a single input matrix. Editor’s Video Summary: The online version of this article (10.1007/s40192-025-00405-6) contains an Editor's Video Summary, which is available to authorized users.
Publisher OA PDF DOI
REN: Anatomically-Informed Mixture-of-Experts for Interstitial Lung Disease Diagnosis
arXiv (Cornell University) · 2025-10-06
preprintOpen access
Mixture-of-Experts (MoE) architectures achieve scalable learning by routing inputs to specialized subnetworks through conditional computation. However, conventional MoE designs assume homogeneous expert capability and domain-agnostic routing-assumptions that are fundamentally misaligned with medical imaging, where anatomical structure and regional disease heterogeneity govern pathological patterns. We introduce Regional Expert Networks (REN), the first anatomically-informed MoE framework for medical image classification. REN encodes anatomical priors by training seven specialized experts, each dedicated to a distinct lung lobe or bilateral lung combination, enabling precise modeling of region-specific pathological variation. Multi-modal gating mechanisms dynamically integrate radiomics biomarkers with deep learning (DL) features extracted by convolutional (CNN), Transformer (ViT), and state-space (Mamba) architectures to weight expert contributions at inference. Applied to interstitial lung disease (ILD) classification on a 597-patient, 1,898-scan longitudinal cohort, REN achieves consistently superior performance: the radiomics-guided ensemble attains an average AUC of 0.8646 +- 0.0467, a +12.5 % improvement over the SwinUNETR single-model baseline (AUC 0.7685, p=0.031). Lower-lobe experts reach AUCs of 0.88-0.90, outperforming DL baselines (CNN: 0.76-0.79) and mirroring known patterns of basal ILD progression. Evaluated under rigorous patient-level cross-validation, REN demonstrates strong generalizability and clinical interpretability, establishing a scalable, anatomically-guided framework potentially extensible to other structured medical imaging tasks. Code is available on our GitHub https://github.com/NUBagciLab/MoE-REN.
Publisher OA PDF DOI
Imaging-Based Mortality Prediction in Patients with Systemic Sclerosis
Lecture notes in computer science · 2025-11-13
book-chapterOpen access
Publisher OA PDF DOI
An AI framework for time series microstructure prediction from processing parameters
Scientific Reports · 2025-07-05 · 6 citations
articleOpen access
In this study, we present an artificial intelligence (AI)-driven framework for predicting the microstructural texture of polycrystalline materials after a specific deformation process. The microstructural texture is defined in terms of the orientation distribution function (ODF) which indicates the volume density of crystal orientations. Our approach leverages an encoder-decoder model with Long Short-Term Memory (LSTM) layers to model the relationship between processing conditions and material properties. As a case study, we apply our framework to copper, generating a dataset of 3125 unique processing parameter combinations and their corresponding ODF vectors. The resulting predictions enable the calculation of homogenized properties. Our AI-driven framework outperforms traditional material processing simulations, yielding faster results with limited error rates (< 0.3% for both the elastic matrix C and the compliance matrix S), making it a promising tool for the expedited design of microstructures with tailored properties.
Publisher OA PDF DOI
Machine Learning Analysis of Electronic Health Records Identifies Interstitial Lung Disease and Predicts Mortality in Patients with Systemic Sclerosis
medRxiv · 2025-06-04
preprintOpen access
Abstract Background Interstitial lung disease (ILD) affects > 40% of patients with systemic sclerosis (SSc/scleroderma) and is the leading cause of disease-related mortality. Although therapies may slow progression, outcomes remain poor, partly because ILD is often detected after irreversible lung injury has occurred. Although chest computed tomography (CT) is a sensitive tool for ILD detection and is recommended at SSc diagnosis, it is oftentimes not performed and even less often performed serially. We sought to develop tools to predict ILD and mortality in patients with SSc using data routinely available in the electronic health record (EHR) to inform medical decision-making. Methods We analyzed longitudinal EHR data from two SSc cohorts: Northwestern University (1,169 participants; derivation cohort) and Yale University (376 participants; validation cohort). We identified clinical features from existing cohort-linked EHR queries composing a convenience sample of data from participants spanning decades rather than employing a single unified data collection effort. Three ILD experts independently reviewed CT reports and classified each as having or lacking ILD. To explore derivation cohort data structure, patients with > =3 forced vital capacity (FVC) results available were identified and stratified according to prevalent or absent ILD. Using unsupervised trajectory-based clustering exploratory analyses, we determined standardized patterns across groups. ML models were then developed using clinical EHR data as predictor variables and prevalent ILD and all-cause mortality as outcome variables. Model performance was assessed using area under the receiver operating characteristic curve (AUC). Results Seventy-four clinical features with low missingness, including demographic, vital sign, laboratory, and pulmonary function test data, were utilized for analyses. Four robust PFT trajectory clusters were identified that were associated with ILD prevalence and mortality in exploratory analyses. A ML model for ILD detection achieved an AUC of 0.832 and retained performance in the Yale cohort (AUC 0.754). In addition to established predictors such as autoantibodies and pulmonary function, the model identified routine laboratory measurements, including red cell distribution width (RDW), white blood cell count, and serum chloride, as important contributors. One-year mortality prediction achieved AUCs of 0.904 in the North-western cohort and 0.910 in the Yale cohort. Among patients with SSc-ILD, one-year mortality was predicted with AUCs of 0.744 and 0.902 in the Northwestern and Yale cohorts, respectively. Unexpectedly, we found that subtle laboratory abnormalities (such as change in RDW) contributed to predicting mortality. Conclusions Our prediction models comprised of widely available EHR data are useful tools to identify SSc patients at high risk for prevalent ILD and all-cause mortality. Integration of these models into clinical practice could enable scalable risk stratification and inform individualized ILD screening and monitoring strategies for SSc patients.
Publisher OA PDF DOI
Parallel Data Object Creation: Towards Scalable Metadata Management in High-Performance I/O Library
ArXiv.org · 2025-06-18
preprintOpen access
High-level I/O libraries, such as HDF5 and PnetCDF, are commonly used by large-scale scientific applications to perform I/O tasks in parallel. These I/O libraries store the metadata such as data types and dimensionality along with the raw data in the same files. While these libraries are well-optimized for concurrent access to the raw data, they are designed neither to handle a large number of data objects efficiently nor to create different data objects independently by multiple processes, as they require applications to call data object creation APIs collectively with consistent metadata among all processes. Applications that process data gathered from remote sensors, such as particle collision experiments in high-energy physics, may generate data of different sizes from different sensors and desire to store them as separate data objects. For such applications, the I/O library's requirement on collective data object creation can become very expensive, as the cost of metadata consistency check increases with the metadata volume as well as the number of processes. To address this limitation, using PnetCDF as an experimental platform, we investigate solutions in this paper that abide the netCDF file format, as well as propose a new file header format that enables independent data object creation. The proposed file header consists of two sections, an index table and a list of metadata blocks. The index table contains the reference to the metadata blocks and each block stores metadata of objects that can be created collectively or independently. The new design achieves a scalable performance, cutting data object creation times by up to 582x when running on 4096 MPI processes to create 5,684,800 data objects in parallel. Additionally, the new method reduces the memory footprints, with each process requiring an amount of memory space inversely proportional to the number of processes.
Publisher OA PDF DOI
Radiomic Features Detect Interstitial Lung Disease in Patients With Systemic Sclerosis
American Journal of Respiratory and Critical Care Medicine · 2025-05-01 · 1 citations
article
Abstract Rationale: Interstitial lung disease (ILD) is a common complication of systemic sclerosis (SSc) associated with significant morbidity and mortality. Professional society guidelines vary but some recommend screening all patients with SSc at diagnosis with computed tomography (CT). Early detection of ILD, however, remains challenging, as subtle CT features can be of unclear significance. Radiomics, which involves extracting quantitative features from information-rich medical imaging, can capture subtle textural and structural changes that are not easily observable to radiologists. We hypothesize that by leveraging radiomic features, machine learning models can aid clinicians in early ILD detection and risk stratification in SSc. Methods: We analyzed CT scans performed between 2015-2024 on patients with SSc enrolled in the Northwestern Scleroderma Registry. Radiomic feature extraction yielded 107 first-order features that detailed the lung's texture, shape, and intensity patterns. Principal component analysis (PCA) was utilized to reduce these features to 12 principal components, capturing the majority of variance in the data. Optuna, an optimization library, was employed to build models to detect ILD using radiomic features. These models included XGBoost, random forest, logistic regression, LightGBM, and gradient boosting. Receiver operator characteristic analysis was implemented to assess the accuracy of model predictions using an 64:16:20 train:validation:test split. Large Language Models (LLMs) were used to assist in plot generation with authors reviewing output. Results: 1221 CT scans were available for analysis from 434 patients with SSc (83% female, median age 58yr [IQR: 50, 66]). 808 CTs (66%) had evidence of ILD reported by a thoracic radiologist. PCA plots of distinct patterns of radiomic features from CT scans clustered patients with and without ILD (Figure 1A). Using ML algorithms via the Optuna framework, radiomic features successfully detected ILD in SSc patients. We used the highest validation area under the receiver operating characteristic curve (AUROC) to select the model and apply on an unseen test set and obtained an AUROC of 0.88 (Figure 1B). Conclusion: Distinct radiomic patterns were identified through ML algorithms that detect ILD in SSc patients with good discrimination. Our findings underscore the utility of radiomics in diagnosis in this high-risk population. Future research will integrate clinical data with radiomic features via deep learning models to improve predictive performance, thereby improving the care of patients with systemic sclerosis.
Publisher DOI
Parallel Data Object Creation: Scalable Metadata Management in Parallel I/O Library
2025-11-07 · 1 citations
articleOpen access
High-level I/O libraries, such as PnetCDF and HDF5, are commonly used by large-scale scientific applications to perform I/O tasks in parallel. These I/O libraries store the metadata of data objects in files along with their raw data. To ensure metadata consistency during parallel data object creation, they require applications to call the metadata APIs collectively using consistent metadata. Such a requirement can result in an expensive consistency check, as its cost increases with the metadata volume and the number of processes. To address this limitation, we propose a new file header format, which uses partitioned metadata blocks to enable independent data object creation and reduce the objects required for consistency check. Our performance evaluation shows that this new design achieves a scalable performance, cutting data object creation times by up to 196 × when running on 4096 MPI processes to create 5,684,800 data objects in parallel.
Publisher DOI
Machine Learning to Predict the Onset of Ventilator-associated Pneumonia Using Electronic Health Record Data
American Journal of Respiratory and Critical Care Medicine · 2025-05-01 · 1 citations
article
Abstract Background: Ventilator-associated pneumonia (VAP) is one of the deadliest hospital-acquired infections, with a mortality rate ranging from 25-70%. Currently, clinical models to predict VAP development are limited. Development of such a model would potentially allow physicians to intervene earlier with diagnostics and treatment, thereby improving patient outcomes. Methods: We examined VAP episodes from patients enrolled in the SCRIPT study, a cohort study of patients on mechanical ventilation who underwent a bronchoalveolar lavage for suspected pneumonia. A team of five attending physicians reviewed patient charts and adjudicated VAP episodes. We visualized patient-day features using hierarchical clustering with Ward's method. Clinical features such as vital signs, ventilator parameters, laboratory values, and medication data were used to develop several machine learning models trained to identify patients on a day to day basis for high risk of developing VAP within the next 7 days. LLMs were used to help coding, with all output reviewed by the team. We used five fold cross validation. For explainability, we used SHAP plots to examine which clinical features impacted model decision-making. Results: We examined ICU stay data from 688 patients, 268 of whom developed VAP. Median patient age was 62 (IQR 51-71), and 59% were male; 42% died. Our dataset had 1,296 ICU-days occurring within seven days before a VAP episode. For clean model training, we used patient days from patients who were adjudicated not to have pneumonia on enrollment into the study and who did not develop VAP during their stay to label the negative class (646 days). Visualization using hierarchical clustering showed correlation with length of hospitalization and ventilation (Figure 1A). The best-performing models using XGBoost had a mean AUROC of 0.774 with standard deviation of 0.026 (Figure 1B). Important features based on SHAP included PEEP, platelets, and day of hospitalization (Figure 1C). Conclusions: Machine learning models can predict VAP onset within the next 7 days with moderate performance. Future work will focus on revising feature selection and attempting alternative machine learning strategies such as deep learning models.
Publisher DOI

Recent grants

Collaborative Research: Advanced Compiler Optimizations and Programming Language Enhancements for Petascale I/O and Storage
NSF · $278k · 2008–2013
NSF Young Investigator: Compiler and Runtime Optimization Techniques for Parallel Programming on Distributed Memory Machines
NSF · $213k · 1993–1998
RIA: Design, Analysis, Simulation and Evaluation of Multi-Level Caches for Scalable Multiprocessors
NSF · $70k · 1991–1994
Collaborative Research: High-Performance Techniques, Designs and Implementation of software Infrastructure for Change Detection and Mining
NSF · $514k · 2005–2009
Collaborative Research: An Application Driven I/O Optimization Approach for PetaScale Systems and Scientific Discoveries
NSF · $320k · 2010–2014

Frequent coauthors

Wei‐keng Liao
Northwestern University
181 shared
Ankit Agrawal
Northwestern University
179 shared
Mahmut Kandemir
Park University
73 shared
Gokhan Memik
44 shared
J. Ramanujam
Louisiana State University
39 shared
R. Ponnusamy
36 shared
Rajeev Thakur
34 shared
Geoffrey Fox
28 shared

Labs

Choudhary LabPI

Resume-aware match score
Save to shortlist
AI-drafted outreach

See your match with Alok Choudhary

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

Join the waitlist How it works

Free to start
No credit card
30-second signup

Find professors who actually fit you