
Michael Franklin
· Professor of Computer ScienceVerifiedUniversity of Chicago · Computer Science
Active 1977–2025
About
Michael Franklin is the Morton D. Hull Distinguished Service Professor of Computer Science at the University of Chicago. He also serves as Senior Advisor to the Provost for Computing and Data Science and is the founding Faculty Co-Director of the Data Science Institute. Franklin was the inaugural holder of the Liew Family Chair of Computer Science at UChicago, where he led the rapid growth of the department in scale, scope, and stature. His research focuses on large-scale data intelligence systems, including early efforts on massively parallel databases, federated data systems, and scalable data-centric AI systems. Prior to his tenure at UChicago, Franklin was the Thomas M. Siebel Professor of Computer Science at the University of California, Berkeley, where he was on the faculty for 17 years and served as Chair of the Computer Science Division of the EECS Department. He was also the Director of the Algorithms, Machines and People Laboratory (AMPLab) and Principal Investigator of the lab’s NSF CISE Expeditions in Computing award. Franklin is one of the original creators of Apache Spark, a leading open-source platform for advanced data analytics and machine learning developed at AMPLab. He has held visiting positions at MIT CSAIL and at research labs in Hong Kong, Shanghai, and Paris. He is a Founding Advisor at Databricks and a technical advisor to data-driven technology companies and organizations, including Chicago-based startups Invocate, Ocient, and Zengines. Franklin was the founding CEO and CTO of Truviso, a data analytics company acquired by Cisco Systems. He currently serves on the ACM Fellows Selection Committee and the US National Academies Computing Breakthroughs Committee. Franklin is a member of the American Academy of Arts and Sciences, elected in 2023, and is a fellow of the ACM, the AAAS, and the Asia-Pacific Artificial Intelligence Association. His awards include the 2022 ACM SIGMOD Systems Award, multiple Test of Time awards, and the CIDR Test of Time award. Franklin holds a Ph.D. from the University of Wisconsin (1993). His research areas include Big Data, Databases, Distributed and Streaming Database Technology, Systems Research, and AI & Machine Learning foundations and applications.
Research topics
- Computer Science
- Data Mining
- Artificial Intelligence
- Machine Learning
- Mathematics
- Data science
- Database
- Social Science
- Statistics
- Information Retrieval
- Computer Security
- World Wide Web
- Finance
- Business
- Economics
- Econometrics
Selected publications
Restoration Ecology · 2025-02-17 · 4 citations
articleOpen access1st authorCorrespondingTo support the persistence of Australian eucalypt woodlands, conservation of remnant vegetation must be augmented with ecological restoration of degraded ecosystems. Certainty about the effectiveness of restoration interventions is urgently required to consistently transition degraded woodlands to reference states. The aim of this meta‐analysis was to quantify the effectiveness of restoration interventions to improve plant and edaphic attributes in degraded temperate and semiarid woodlands of Australia. Our structured literature search retrieved 35 studies that were suitable for analysis, which enabled assessment of six types of restoration interventions and 11 ecosystem response metrics. Effectiveness was quantified using estimates of the probability and magnitude of responses generated from Bayesian multi‐level models. We found consistent increases with varying average levels for carbon (via sugar) addition (43%) and burning (27%) on native plants, burning on cryptogams (91%), and woody debris addition on soil moisture (35%) and carbon (21%). Native plants had a low probability of benefitting from slashing (0.33) or herbicide application (0.09). Slashing had a high probability of increasing introduced plants (0.83). Planting almost always failed to achieve reference levels for native plant communities, introduced plants, or soil phosphorus. A very high level of uncertainty was evident for the outcomes of herbicide and sugar addition on introduced plants. Overall, we found a paucity of adequate studies, including insufficient quantitative information on combinations of interventions, and a lack of effectiveness in common interventions. Our results indicate an urgent need for experiments to be embedded in restoration programs to improve certainty in restoration effectiveness.
Biological Conservation · 2025-10-07
articleOpen access1st authorCorrespondingThe world's dwindling woodlands face ongoing pressures from agriculture and urban expansion, with most having been disturbed by human activities, resulting in degradation and biodiversity losses. To better manage and promote resilience in threatened woodlands, we need to know more about the relationships between structure and diversity in these ecosystems. The influences of tree assemblage attributes on herb species have been investigated in forests, but such effects are poorly understood in more open woodland systems. This study aimed to determine how tree species richness, size class diversity and canopy cover influence native herb species richness and cover in coastal grassy woodlands of south-eastern Australia. Tree and herb composition and structure data were captured in 93 plots situated across the distribution of critically endangered Cumberland Plain Woodland. Bayesian models were used to elicit effects of trees on herbs. Where tree species richness was high, there were 28 % more native herb species on average, with both grasses and forbs contributing to the increase, and herb cover was 16 % higher. Tree size diversity increased native grass cover (by 19 %) but had no effect on forb cover or herb species richness. Native herb cover was 10 % lower under high tree canopy cover. In more recently disturbed woodlands, native grass cover increased under simulated thinning of dense, small tree regrowth, highlighting its potential as a conservation management tool. We found that species rich and structurally diverse canopy tree assemblages supported components of native herb species richness and cover in grassy woodlands. • Causal influences of trees on woodland herbs were investigated using Bayesian models. • Native herb species richness and cover were higher under species-rich tree assemblages. • Tree size diversity increased native grass cover, but not herb species richness. • Simulated thinning of dense small trees increased tree size diversity and grass cover. • Diverse canopy tree assemblages support native herbs in declining grassy woodlands.
VUS: Effective and Efficient Accuracy Measures for Time-Series Anomaly Detection
ArXiv.org · 2025-02-18
preprintOpen accessAnomaly detection (AD) is a fundamental task for time-series analytics with important implications for the downstream performance of many applications. In contrast to other domains where AD mainly focuses on point-based anomalies (i.e., outliers in standalone observations), AD for time series is also concerned with range-based anomalies (i.e., outliers spanning multiple observations). Nevertheless, it is common to use traditional point-based information retrieval measures, such as Precision, Recall, and F-score, to assess the quality of methods by thresholding the anomaly score to mark each point as an anomaly or not. However, mapping discrete labels into continuous data introduces unavoidable shortcomings, complicating the evaluation of range-based anomalies. Notably, the choice of evaluation measure may significantly bias the experimental outcome. Despite over six decades of attention, there has never been a large-scale systematic quantitative and qualitative analysis of time-series AD evaluation measures. This paper extensively evaluates quality measures for time-series AD to assess their robustness under noise, misalignments, and different anomaly cardinality ratios. Our results indicate that measures producing quality values independently of a threshold (i.e., AUC-ROC and AUC-PR) are more suitable for time-series AD. Motivated by this observation, we first extend the AUC-based measures to account for range-based anomalies. Then, we introduce a new family of parameter-free and threshold-independent measures, Volume Under the Surface (VUS), to evaluate methods while varying parameters. We also introduce two optimized implementations for VUS that reduce significantly the execution time of the initial implementation. Our findings demonstrate that our four measures are significantly more robust in assessing the quality of time-series AD methods.
VUS: effective and efficient accuracy measures for time-series anomaly detection
The VLDB Journal · 2025-03-27 · 18 citations
articleOpen accessThe Cambridge Report on Database Research
ArXiv.org · 2025-04-15 · 1 citations
preprintOpen accessOn October 19 and 20, 2023, the authors of this report convened in Cambridge, MA, to discuss the state of the database research field, its recent accomplishments and ongoing challenges, and future directions for research and community engagement. This gathering continues a long standing tradition in the database community, dating back to the late 1980s, in which researchers meet roughly every five years to produce a forward looking report. This report summarizes the key takeaways from our discussions. We begin with a retrospective on the academic, open source, and commercial successes of the community over the past five years. We then turn to future opportunities, with a focus on core data systems, particularly in the context of cloud computing and emerging hardware, as well as on the growing impact of data science, data governance, and generative AI. This document is not intended as an exhaustive survey of all technical challenges or industry innovations in the field. Rather, it reflects the perspectives of senior community members on the most pressing challenges and promising opportunities ahead.
Water · 2025-08-12
articleOpen accessTemperate Highland Peat Swamps on Sandstone (THPSS) are wetlands in the Blue Mountains, south-eastern Australia. The wetlands have legislative protection as endangered ecological communities. They have long-standing cultural significance for Gundungurra Traditional Custodians. Previous studies document their degradation by urban development and vulnerability to extreme weather. Water quality in our study was assessed at wetlands in protected areas and compared with others exposed to urban development. We derived water quality guidelines that are intended to help future water quality assessment at THPSS and, in particular, to detect any impact from urban development on these wetland systems. Water quality in urban swamps was consistent with the freshwater salinisation syndrome despite all the swamps having relatively low electrical conductance (<140 µS cm−1). Urban swamp water had salinity (mean 87.3 µS cm−1) three times that of non-urban swamps (mean 28 µS cm−1). The ionic composition of urban swamp water was dominated by calcium and bicarbonate, consistent with urban alkalisation syndrome. Our guidelines instead recommend limits for pH, salinity, turbidity, dissolved oxygen, and metals detected in greater concentrations that were found in urban swamps (iron, manganese, barium, and strontium). Our results support the theory that the dissolution of urban concrete materials is a degradation process that contributes to the impairment of urban swamp water quality.
Relative Species Mobility Is a Key Determinant of Avian Diversity in Post‐Megafire Recovery
Austral Ecology · 2025-11-01
article1st authorCorrespondingABSTRACT Megafires are a class of very large wildfire linked to climate change. Such fires typically cause extensive loss of assets, and while the economic and social costs are often quantified, much less is known about the impacts on forest biota. This study investigated the effects of a megafire of unprecedented scale on birds in dry eucalypt forests of south‐eastern Australia. We aimed to determine how the extent of high‐severity fire influenced patterns of species occurrence and recovery of diversity post‐megafire, with consideration of pre‐fire occurrence and richness. Because high mobility may be an advantageous trait in fire‐prone forests, relatively mobile species (migrants, nomads) and exclusively sedentary species were evaluated separately. Acoustic recorders were used to survey birds in the year before and one year after the megafire. To explore the scale at which birds may respond to high‐severity fire, the proportion of area burnt at high severity was calculated in concentric circles with radii 325 and 564 m from the acoustic recorder in each site. Individual species responses were estimated using a Bayesian latent variable model. Separate Bayesian species richness models were compared based on out‐of‐sample predictive accuracy. Species responses to megafire and the extent of high‐severity fire were mixed (positive, negative, no response), but alpha and gamma diversity were close to pre‐fire levels. Negative responses to megafire shown by several species corresponded with previously published estimates of population declines. Pre‐fire numbers of species in sites predicted post‐fire richness, with high‐severity fire having no additional influence. Relatively mobile species were prominent in the recovery of the avifauna, suggesting that dispersal capacity played an important role in recolonisation. Further studies incorporating fire, climate, environmental attributes and human land use are required to advance our mechanistic understanding of avian occurrence in fire‐prone forests.
Weed Research · 2025-07-01
articleOpen accessSenior authorABSTRACT Frogbit ( Hydrocharis laevigata ) is an aquatic invasive species with a high capacity to disperse, which can lead to rapid colonisation and potential domination of the littoral zone of affected freshwater systems. Little is known about the tolerance of this species to saline waters, or whether it has the potential to impact estuaries or tidal sections of rivers. On receipt of a communication that Frogbit had been observed in an estuary in south‐eastern Australia, we conducted a glasshouse experiment with the aim of making an initial assessment of the salinity tolerance of Frogbit. Mature and juvenile Frogbit ramets were placed in open 1‐L opaque, cylindrical containers with water and six sodium chloride concentrations (control (0 g/L), 1, 2, 5, 10 and 20 g/L) replicated 12 times each. Ramets were assessed as alive or dead at seven, 14 and 21 days. Bayesian logistic regression was used to model Frogbit responses to salinity. At 21 days, all except two ramets remained alive in the control, 1 g/L and 2 g/L NaCl treatments. At 7 days, the probability of mature and juvenile survival began to decline sharply at 5 g/L and was approximately zero at 15 g/L on average. The probability of survival reduced further with time at these intermediate concentrations. All ramets were dead in the 20 g/L solution at day 21. Our results show that while Frogbit may tolerate slightly elevated salinity for short periods, it is unlikely to persist in parts of estuaries where seawater prevails. However, further research is urgently required to understand the extent to which Frogbit can occupy tidal waterways with regular influxes of fresh water on receding tides.
Databases Unbound: Querying All of the World's Bytes with AI
Proceedings of the VLDB Endowment · 2024-08-01 · 10 citations
articleOver the past five decades, the relational database model has proven to be a scaleable and adaptable model for querying a variety of structured data, with use cases in analytics, transactions, graphs, streaming and more. However, most of the world's data is unstructured. Thus, despite their success, the reality is that the vast majority of the world's data has remained beyond the reach of relational systems. The rise of deep learning and generative AI offers an opportunity to change this. These models provide a stunning capability to extract semantic understanding from almost any type of document, including text, images, and video, which can extend the reach of databases to all the world's data. In this paper we explore how these new technologies will transform the way we build database management software, creating new that systems that can ingest, store, process, and query all data. Building such systems presents many opportunities and challenges. In this paper we focus on three: scalability, correctness, and reliability, and argue that the declarative programming paradigm that has served relational systems so well offers a path forward in the new world of AI data systems as well. To illustrate this, we describe several examples of such declarative AI systems we have built in document and video processing, and provide a set of research challenges and opportunities to guide research in this exciting area going forward. And lovely apparitions, -dim at first , Then radiant, as the mind arising bright From the embrace of beauty (whence the forms Of which these are the phantoms) casts on them The gathered rays which are reality- Shall visit us the progeny immortal Of Painting, Sculpture, and rapt Poesy , And arts, though unimagined, yet to be ; Prometheus Unbound, Percy Bysshe Shelley
Riveter: Adaptive Query Suspension and Resumption Framework for Cloud Native Databases
2024-05-13
articleIn modern cloud environments, ephemeral resources with intermittent availability and fluctuating monetary costs are becoming common. This dynamic nature presents a new challenge when deploying cloud-native databases: adaptive query execution, which can suspend queries when the resources are scarce or costs unexpectedly soar, and then resume them when the resources become available or cost-effective. Addressing this challenge requires the design and implementation of query suspension and resumption with a mechanism that can adaptively determine when, if, and how to suspend queries. In this paper, we propose Riveter, a query suspension and resumption framework that can adaptively pause ongoing queries using various strategies, including (1) a redo strategy that terminates queries and subsequently re-runs them, (2) a pipeline-level strategy that suspends a query once one of its pipelines has completed to reduce the storage requirements for intermediate data, (3) and a process-level strategy that enables the suspension of query execution processes at any given moment but generates a substantial volume of intermediate data for query resumption. We also devise a cost model to estimate query latency using various strategies and an algorithm to select the one that causes minimum latency. To demonstrate the effectiveness of Riveter, we conduct evaluations based on the TPC-H benchmark to investigate intermediate data persistence, strategy selection, and cost model-based estimation. Our results not only present the difference among the strategies of Riveter in terms of the size of persisted intermediate data and the time of triggering the suspension but also confirm the adaptive and efficient query suspension and resumption delivered by Riveter.
Frequent coauthors
- 73 shared
Tim Kraska
Amazon (United States)
- 61 shared
Sanjay Krishnan
University of Chicago
- 47 shared
Ion Stoica
- 46 shared
Joseph M. Hellerstein
University of California, Berkeley
- 46 shared
Jiannan Wang
Purdue University System
- 37 shared
Aaron J. Elmore
University of Chicago
- 35 shared
Samuel Madden
- 28 shared
Reynold Xin
Databricks (United States)
Labs
Education
Ph.D.
University of California, Berkeley
B.S.
University of California, Berkeley
Awards & honors
- 2025 CIDR Test of Time Award
- 2023 Arthur Kelly Faculty Prize
- 2022 ACM SIGMOD Systems Award
- 2021 AAAS Fellow
- 2013 SIGMOD Test of Time Award
- Resume-aware match score
- Save to shortlist
- AI-drafted outreach
See your match with Michael Franklin
PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.
- Free to start
- No credit card
- 30-second signup