Matthew Turk

· Professor EmeritiVerified

University of Illinois Urbana-Champaign · Interdisciplinary Computing and the Arts

Active 1985–2026

h-index42

Citations9.9k

Papers23755 last 5y

Funding$3.6M

Faculty page Lab page Website

See your match with Matthew Turk — sign in to PhdFit.Sign in

About

The research in our lab uses advanced data science techniques to understand how water, plants, geology and climate interact in a tightly coupled system – and how humans are changing this system.

Research topics

Astrophysics
Physics
Astronomy

Selected publications

<tt>libyt</tt> : An In Situ Interface Connecting Simulations with <tt>yt</tt> , Python, and Jupyter Workflows
The Astrophysical Journal Supplement Series · 2026-05-01
articleOpen accessSenior author
Abstract In the exascale computing era, handling and analyzing massive data sets have become extremely challenging. In situ analysis, which processes data during simulation runtime and bypasses costly intermediate disk input and output steps, offers a promising solution. We present libyt ( https://github.com/yt-project/libyt ), an open-source C library that enables astrophysical simulations to analyze and visualize data in parallel computation with yt or other Python packages. libyt can invoke Python routines automatically or provide interactive entry points via a Python prompt or a Jupyter Notebook. It requires minimal intervention in researchers’ workflows, allowing users to reuse job submission scripts and Python routines. We describe libyt ’s architecture for parallel computing in high-performance computing environments, including its bidirectional connection between simulation codes and Python, and its integration into the Jupyter ecosystem. We detail its methods for reading patch-based adaptive mesh refinement simulations and handling in-memory data with minimal overhead, and procedures for yielding data when requested by Python. We describe how libyt maps simulation data to yt front ends, allowing postprocessing scripts to be converted into in situ analysis with just two lines of change. We document libyt ’s application programming interface (API) and demonstrate its integration into two astrophysical simulation codes, GAMER and Enzo , using examples including core-collapse supernovae, isolated dwarf galaxies, fuzzy dark matter, the Sod shock tube test, Kelvin–Helmholtz instability, and the AGORA galaxy simulation. Finally, we discuss libyt ’s performance, limitations related to data redistribution, extensibility, architecture, and comparisons with traditional postprocessing approaches.
Publisher DOI
Now More Than Ever, Foundational AI Research and Infrastructure Depends on the Federal Government
ArXiv.org · 2025-06-17
preprintOpen access
Leadership in the field of AI is vital for our nation's economy and security. Maintaining this leadership requires investments by the federal government. The federal investment in foundation AI research is essential for U.S. leadership in the field. Providing accessible AI infrastructure will benefit everyone. Now is the time to increase the federal support, which will be complementary to, and help drive, the nation's high-tech industry investments.
Publisher OA PDF DOI
Spezi Data Pipeline: Streamlining FHIR-based Interoperable Digital Health Data Workflows
ArXiv.org · 2025-09-17
preprintOpen access
The increasing adoption of digital health technologies has amplified the need for robust, interoperable solutions to manage complex healthcare data. We present the Spezi Data Pipeline, an open-source Python toolkit designed to streamline the analysis of digital health data, from secure access and retrieval to processing, visualization, and export. The Pipeline is integrated into the larger Stanford Spezi open-source ecosystem for developing research and translational digital health software systems. Leveraging HL7 FHIR-based data representations, the pipeline enables standardized handling of diverse data types--including sensor-derived observations, ECG recordings, and clinical questionnaires--across research and clinical environments. We detail the modular system architecture and demonstrate its application using real-world data from the PAWS at Stanford University, in which the pipeline facilitated efficient extraction, transformation, and clinician-driven review of Apple Watch ECG data, supporting annotation and comparative analysis alongside traditional monitors. By reducing the need for bespoke development and enhancing workflow efficiency, the Spezi Data Pipeline advances the scalability and interoperability of digital health research, ultimately supporting improved care delivery and patient outcomes.
Publisher OA PDF DOI
Developing Library and Data Storytelling Toolkits: Scenarios and Personas
Lecture notes in computer science · 2024-01-01
book-chapterSenior author
Publisher DOI
Why does the Milky Way have a metallicity floor?
Monthly Notices of the Royal Astronomical Society · 2024-07-19 · 4 citations
articleOpen access
ABSTRACT The prevalence of light element enhancement in the most metal-poor stars is potentially an indication that the Milky Way has a metallicity floor for star formation around $\sim 10^{-3.5}$ Z$_{\odot }$. We propose that this metallicity floor has its origins in metal-enriched star formation in the minihaloes present during the Galaxy’s initial formation. To arrive at this conclusion, we analyse a cosmological radiation hydrodynamics simulation that follows the concurrent evolution of multiple Population III star-forming minihaloes. The main driver for the central gas within minihaloes is the steady increase in hydrostatic pressure as the haloes grow. We incorporate this insight into a hybrid one-zone model that switches between pressure-confined and modified free-fall modes to evolve the gas density with time according to the ratio of the free-fall and sound-crossing time-scales. This model is able to accurately reproduce the density and chemo-thermal evolution of the gas in each of the simulated minihaloes up to the point of runaway collapse. We then use this model to investigate how the gas responds to the absence of H$_{2}$. Without metals, the central gas becomes increasingly stable against collapse as it grows to the atomic cooling limit. When metals are present in the halo at a level of $\sim 10^{-3.7}$ Z$_{\odot }$, however, the gas is able to achieve gravitational instability while still in the minihalo regime. Thus, we conclude that the Galaxy’s metallicity floor is set by the balance within minihaloes of gas-phase metal cooling and the radiation background associated with its early formation environment.
Publisher OA PDF DOI
Libyt: A Tool for Parallel In Situ Analysis with yt, Python, and Jupyter
2024-05-15 · 3 citations
articleOpen accessSenior author
In the era of extreme-scale computing, large-scale data storage and analysis have become more critical and challenging. For postprocessing, the simulation first needs to dump snapshots on a hard disk before processing any data. This becomes a bottleneck for high spatial and temporal resolution simulation. In situ analysis provides a viable solution for analyzing extreme scale simulations by processing data in memory, which skips the step of storing data on disk. We present libyt, an open-source C library that allows researchers to analyze and visualize data using yt or other Python packages in parallel computing during simulation runtime. We describe the code method for connecting simulation runtime data to Python, handling data transition and redistribution between Python and simulation processes with minimal memory overhead, and supporting interactive Python prompt and Jupyter Notebook for users to probe the ongoing simulation data at the current time step. We demonstrate how it solves the problem of visualizing large-scale astrophysical simulations, improving disk usage efficiency, and monitoring simulations closely. We conclude it with discussions and compare libyt to post-processing.
Publisher DOI
Teaching data storytelling as data literacy
Information and Learning Sciences · 2024-04-29 · 6 citations
articleSenior author
Purpose Data storytelling courses position students as agents in creating stories interpreted from data about a social problem or social justice issue. The purpose of this study is to explore two research questions: What themes characterized students’ iterative development of data story topics? Looking back at six years of iterative feedback, what categories of data literacy pedagogy did instructors engage for these themes?. Design/methodology/approach This project examines six years of data storytelling final projects using thematic analysis and three years of instructor feedback. Ten themes in final projects align with patterns in feedback. Reflections on pedagogical approaches to students’ topic development suggest extending data literacy pedagogy categories – formal, personal and folk (Pangrazio and Sefton-Green, 2020). Findings Data storytelling can develop students’ abilities to move from being consumers to creators of data and interpretations. The specific topic of personal data exposure or risk has presented some challenges for data literacy instruction (Bowler et al., 2017). What “personal” means in terms of data should be defined more broadly. Extending the data literacy pedagogy categories of formal, personal and folk (Pangrazio and Sefton-Green, 2020) could more effectively center social justice in data literacy instruction. Practical implications Implications for practice include positioning students as producers of data interpretation, such as role-playing data analysis or decision-making scenarios. Social implications Data storytelling has the potential to address current challenges in data literacy pedagogy and in teaching critical data literacy. Originality/value Course descriptions provide a template for future data literacy pedagogy involving data storytelling, and findings suggest implications for expanding definitions and applications of personal and folk data literacies.
Publisher DOI
Why does the Milky Way have a metallicity floor?
arXiv (Cornell University) · 2024-06-12
preprintOpen access
The prevalence of light element enhancement in the most metal-poor stars is potentially an indication that the Milky Way has a metallicity floor for star formation around $\sim$10$^{-3.5}$ Z$_{\odot}$. We propose that this metallicity floor has its origins in metal-enriched star formation in the minihalos present during the Galaxy's initial formation. To arrive at this conclusion, we analyze a cosmological radiation hydrodynamics simulation that follows the concurrent evolution of multiple Population III star-forming minihalos. The main driver for the central gas within minihalos is the steady increase in hydrostatic pressure as the halos grow. We incorporate this insight into a hybrid one-zone model that switches between pressure-confined and modified free-fall modes to evolve the gas density with time according to the ratio of the free-fall and sound-crossing timescales. This model is able to accurately reproduce the density and chemo-thermal evolution of the gas in each of the simulated minihalos up to the point of runaway collapse. We then use this model to investigate how the gas responds to the absence of H$_{2}$. Without metals, the central gas becomes increasingly stable against collapse as it grows to the atomic cooling limit. When metals are present in the halo at a level of $\sim$10$^{-3.7}$ Z$_{\odot}$, however, the gas is able to achieve gravitational instability while still in the minihalo regime. Thus, we conclude that the Galaxy's metallicity floor is set by the balance within minihalos of gas-phase metal cooling and the radiation background associated with its early formation environment.
Publisher OA PDF DOI
Utilizing Adversarial Examples for Bias Mitigation and Accuracy Enhancement
arXiv (Cornell University) · 2024-04-18
preprintOpen accessSenior author
We propose a novel approach to mitigate biases in computer vision models by utilizing counterfactual generation and fine-tuning. While counterfactuals have been used to analyze and address biases in DNN models, the counterfactuals themselves are often generated from biased generative models, which can introduce additional biases or spurious correlations. To address this issue, we propose using adversarial images, that is images that deceive a deep neural network but not humans, as counterfactuals for fair model training. Our approach leverages a curriculum learning framework combined with a fine-grained adversarial loss to fine-tune the model using adversarial examples. By incorporating adversarial images into the training data, we aim to prevent biases from propagating through the pipeline. We validate our approach through both qualitative and quantitative assessments, demonstrating improved bias mitigation and accuracy compared to existing methods. Qualitatively, our results indicate that post-training, the decisions made by the model are less dependent on the sensitive attribute and our model better disentangles the relationship between sensitive attributes and classification variables.
Publisher OA PDF DOI
CAVLI - Using image associations to produce local concept-based explanations
2023-06-01 · 4 citations
articleSenior author
While explainability is becoming increasingly crucial in computer vision and machine learning, producing explanations that can link decisions made by deep neural networks to concepts that are easily understood by humans still remains a challenge. To address this challenge, we propose a framework that produces local concept-based explanations for the classification decisions made by a deep neural network. Our framework is based on the intuition that if there is a high overlap between the regions of the image that are associated with a human-defined concept and regions of the image that are useful for decision-making, then the decision is highly dependent on the concept. Our proposed CAVLI framework combines a global approach (TCAV) with a local approach (LIME). To test the effectiveness of the approach, we conducted experiments on both the ImageNet and CelebA datasets. These experiments validate the ability of our framework to quantify the dependence of individual decisions on predefined concepts. By providing local concept-based explanations, our framework has the potential to improve the transparency and interpretability of deep neural networks in a variety of applications.
Publisher DOI

Recent grants

SI2-SSE: yt: Reusable Components for Simulating, Analyzing and Visualizing Astrophysical Systems
NSF · $494k · 2013–2015
HCC: Small: Telecollaboration in Physical Spaces
NSF · $500k · 2012–2016
RI: Small: Crowd-Sourcing the World: Scalable Methods for Dynamic Structure from Motion
NSF · $477k · 2014–2018
Detecting and Analyzing Discontinuities in Computer Vision
NSF · $328k · 2005–2009
Collaborative Research: SI2-SSI: Inquiry-Focused Volumetric Data Analysis Across Scientific Domains: Sustaining and Expanding the yt Community
NSF · $1.1M · 2017–2023

Frequent coauthors

Tom Abel
412 shared
John Wise
392 shared
Britton Smith
348 shared
G. Desvignes
133 shared
Ue‐Li Pen
99 shared
Shiro Ikeda
The University of Tokyo
94 shared
Jonathan Weintroub
Center for Astrophysics Harvard & Smithsonian
94 shared
Jordy Davelaar
79 shared

Labs

Tague Team LabPI

Awards & honors

John Simon Guggenheim Fellowship in Visual Arts (2016)
Making Visible the Invisible (permanent installation at Seat…
Creative Capital Foundation support
Daniel Langlois Foundation for the Arts, Science and Technol…
Canada Council for the Arts support

Resume-aware match score
Save to shortlist
AI-drafted outreach

See your match with Matthew Turk

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

Join the waitlist How it works

Free to start
No credit card
30-second signup

Find professors who actually fit you