Mark van der Laan

· PhD Professor, Biostatistics and Statistics

University of California, Berkeley · Biostatistics

Active 1995–2026

h-index44

Citations7.1k

Papers22894 last 5y

Funding—

Faculty page

OpenAlex

See your match with Mark van der Laan — sign in to PhdFit.Sign in

About

Mark Johannes van der Laan is the Jiann-Ping Hsu/Karl E. Peace Professor of Biostatistics and Statistics at the University of California, Berkeley. He graduated in 1993 under the supervision of Richard Gill at Utrecht University in the Netherlands. Since starting a position in Biostatistics in 1994, he has been at UC Berkeley. His research contributions include work in survival analysis, semiparametric statistics, multiple testing, censored data, and causal inference. He developed the targeted maximum likelihood methodology and the general theory for super-learning. Van der Laan is a founding editor of the Journal of Causal Inference and the International Journal of Biostatistics. He has authored over 300 publications, written four books on targeted learning, censored data, and multiple testing, and has mentored 55 PhD students. His awards include the COPSS Presidents' Award in 2005, the Mortimer Spiegelman Award in 2004, and the van Dantzig Award in 2005.

Research topics

Computer Science
Data science
Medicine
Political Science
Artificial Intelligence
Business
Management science
Accounting
Mathematics
Statistics
Applied mathematics
Risk analysis (engineering)
Process management
Engineering

Selected publications

Sequential invitations to FOBT screening and colorectal cancer incidence
Scientific Reports · 2026-04-18
articleOpen access
The effect of different sequences of invitations to Faecal Occult Blood Test (FOBT) screening regarding colorectal cancer (CRC) incidence has never been evaluated. In 2008-2012, all residents in Stockholm-Gotland, Sweden, born 1938-1954, were randomly assigned by birth year to different calendar years of invitation to guaiac-based FOBT (g) or Faecal Immunochemical Test (f) screening at 60-69 years (1-5 rounds), or not (0). Linkage was made to the national Cancer- and Cause of Death Registers on CRC diagnosis and mortality 1958-2020, and the Swedish Colorectal Cancer Register regarding stage. Follow-up started age 60 and CRC incidence, calculated per 100,000 person-years, was assessed during screening (age 60-69) and post screening (age 70-73). Stage I-II and III-IV was assessed post screening. 364,668 individuals were included. During screening, incidence rate ratio was significantly higher in sequences (0, g, g, g, g) (RR 1.25, 95% CI 1.09-1.43), (g, g, g, g, f) (RR 1.17, 95% CI 1.01-1.35), and (g, g, f, f, f) (RR 1.14, 95% CI 1.01-1.29). Post screening, the largest decrease was seen in sequences (g, g, g, g, f) and (g, g, g, f, f), RR 0.65, 95%, CI 0.47-0.90, and RR 0.53, 95% CI 0.30-0.94, respectively. There was an overall decreasing trend along sequences from (0, 0, 0, 0, g) to (g, g, f, f, f) post screening and both stages I-II and III-IV (p < 0.001). We could demonstrate a decreased CRC incidence post screening proportional to the number of invitations with implications for future modeling studies and risk-based screening strategies.
Publisher OA PDF DOI
The Impact of Job Stability on Monetary Poverty in Italy: Causal Small Area Estimation
Journal of the American Statistical Association · 2025-12-10
articleOpen accessSenior author
Job stability-encompassing secure contracts, adequate wages, social benefits, and career opportunities-is a critical determinant in reducing monetary poverty, as it provides households with reliable income and enhances economic well-being. This study draws on EU-SILC survey and census data to estimate the causal effect of job stability on monetary poverty across Italian provinces, quantifying its influence, and analyzing regional disparities. We introduce a novel causal small area estimation (CSAE) framework that integrates global and local estimation strategies for heterogeneous treatment effect estimation, effectively addressing data sparsity at the provincial level. Furthermore, we develop a general bootstrap scheme to construct reliable confidence intervals, applicable regardless of the method used for estimating nuisance parameters. Extensive simulation studies demonstrate that our proposed estimators outperform classical causal inference methods in terms of stability while maintaining computational scalability for large datasets. Applying this methodology to real-world data, we uncover significant relationships between job stability and poverty in six Italian regions, offering critical insights into regional disparities and their implications for evidence-based policy design. Supplementary materials for this article are available online, including a standardized description of the materials available for reproducing the work.
Publisher DOI
Efficient Statistical Estimation for Sequential Adaptive Experiments with Implications for Adaptive Designs
ArXiv.org · 2025-08-12
preprintOpen accessSenior author
Adaptive experimental designs have gained popularity in clinical trials and online experiments. Unlike traditional, fixed experimental designs, adaptive designs can dynamically adjust treatment randomization probabilities and other design features in response to data accumulated sequentially during the experiment. These adaptations are useful to achieve diverse objectives, including reducing uncertainty in the estimation of causal estimands or increasing participants' chances of receiving better treatments during the experiment. At the end of the experiment, it is often desirable to answer causal questions from the observed data. However, the adaptive nature of such experiments and the resulting dependence among observations pose significant challenges to providing valid statistical inference and efficient estimation of causal estimands. Building upon the Targeted Maximum Likelihood Estimator (TMLE) framework tailored for adaptive designs (van der Laan, 2008), we introduce a new Adaptive-Design-Likelihood-based TMLE (ADL-TMLE) to estimate a wide class of causal estimands from adaptive experiment data, including the average treatment effect as our primary example. We establish asymptotic normality and semiparametric efficiency of ADL-TMLE under relaxed positivity and design stabilization assumptions for adaptive experiments. Motivated by these results, we further propose a novel adaptive design aimed at minimizing the variance of the estimator based on data generated under that design. Simulations show that ADL-TMLE demonstrates superior variance-reduction performance across different types of adaptive experiments, and that the proposed adaptive design attains lower variance than the standard efficiency-oriented adaptive design. Finally, we generalize our framework to broader settings, including those with longitudinal structures.
Publisher OA PDF DOI
<div> Queueing Causal Models:&nbsp;<span>Comparative Analytics in Queueing Systems</span></div>
SSRN Electronic Journal · 2025-01-01
preprintOpen access
Publisher DOI
Causal inference for calibrated scaling interventions on time-to-event processes
ArXiv.org · 2025-10-19
preprintOpen accessSenior author
This work develops a flexible inferential framework for nonparametric causal inference in time-to-event settings, based on stochastic interventions defined through multiplicative scaling of the intensity governing an intermediate event process. These interventions induce a family of estimands indexed by a scalar parameter α, representing effects of modifying event rates while preserving the temporal and covariate-dependent structure of the observed data generating mechanism. To enhance interpretability, we introduce calibrated interventions, where α is chosen to achieve a pre-specified goal, such as a desired level of cumulative risk of the intermediate event, and define corresponding composite target parameters capturing the downstream effects on the outcome process. This yields clinically meaningful contrasts while avoiding unrealistic deterministic intervention regimes. Under a nonparametric model, we derive efficient influence curves for α-indexed, calibrated, and composite target parameters and establish their double robustness properties. We further sketch a targeted maximum likelihood estimation (TMLE) strategy that accommodates flexible, machine learning based nuisance estimation. The proposed framework applies broadly to (causal) questions involving time-to-event treatments or mediators and is illustrated through different examples event-history settings. A simulation study demonstrates finite-sample inferential properties, and highlights the implications of practical positivity violations when interventions extend beyond observed data support.
Publisher OA PDF DOI
Efficacy and safety of belumosudil as compared with best available therapy for the treatment of cGVHD in the United States
Blood Advances · 2025-08-28
articleOpen access
ABSTRACT: Belumosudil was approved by the Food and Drug Administration in the United States for the treatment of relapsed/refractory chronic graft-versus-host disease (cGVHD) based on a randomized phase 2 trial comparing 2 belumosudil doses. The efficacy and safety of belumosudil vs the best available therapy (BAT) has not been studied. Applying rigorous statistical methodology to real-world data, this study estimated the efficacy of belumosudil vs BAT in cGVHD patients whose disease failed to respond to 2 to 5 prior lines of therapy (LOTs). Retrospective data between March 2015 and 2024 were collected across 8 US sites for 196 patients, contributing 113 belumosudil and 245 BAT LOTs. The primary outcome was 6-month overall response rate (ORR), defined as the proportion of complete or partial responses based on 2014 National Institutes of Health consensus criteria, physician assessment, or corticosteroid dose taper of ≥50% without cGVHD progression. Death, relapse, and beginning a new LOT were considered a lack of response. Targeted maximum likelihood estimation (TMLE) was used to estimate the 6-month ORR following belumosudil vs BAT (38.7% vs 26.8%, respectively) or 44.2% improvement with belumosudil (1-sided 95% confidence interval [CI], [4.4 to ∞]; P = .031). TMLE was also used to estimate 1-year failure-free survival when treated with belumosudil (61.2%) or BAT (47.8%), a 13.5% difference (95% CI, 1.5-100; P = .032). Descriptive assessment of safety showed adverse events recorded in 27% of belumosudil and 36% of BAT LOTs. Findings demonstrated that belumosudil improved clinical outcomes compared to BAT in cGVHD patients with 2 to 5 prior LOTs, and safety was consistent with belumosudil's established profile.
Publisher OA PDF DOI
Note on targeted learning with an undersmoothed Lasso propensity score model for large-scale covariate adjustment in health care database studies
American Journal of Epidemiology · 2025-02-05 · 1 citations
articleOpen access
Publisher OA PDF DOI
An Estimator-Robust Design for Augmenting Randomized Controlled Trial with External Real-World Data
ArXiv.org · 2025-01-29
preprintOpen accessSenior author
Augmenting randomized controlled trials (RCTs) with external real-world data (RWD) has the potential to improve the finite sample efficiency of treatment effect estimators. We describe using adaptive targeted maximum likelihood estimation (A-TMLE) for estimating the average treatment effect (ATE) by decomposing the ATE estimand into two components: a pooled-ATE estimand that combines data from both the RCT and external sources, and a bias estimand that captures the conditional effect of RCT enrollment on the outcome. This approach views the RCT data as the reference and corrects for inconsistencies of any kind between the RCT and the external data source. Given the growing abundance of external RWD from modern electronic health records, determining the optimal strategy to select candidate external patients for data integration remains an open yet critical problem. In this work, we begin by analyzing the robustness property of the A-TMLE estimator and then propose a matching-based sampling strategy that improves the robustness of the estimator with respect to the target estimand. Our proposed strategy is outcome-blind and involves matching based on two one-dimensional scores: the trial enrollment score and the propensity score in the external data. We demonstrate in simulations that our sampling strategy improves the coverage and shortens the widths of confidence intervals produced by A-TMLE. We illustrate our method with a case study of augmenting the DEVOTE cardiovascular safety trial by using the Optum Clinformatics claims database.
Publisher OA PDF DOI
Targeted Deep Architectures: A TMLE-Based Framework for Robust Causal Inference in Neural Networks
ArXiv.org · 2025-07-16
preprintOpen accessSenior author
Modern deep neural networks are powerful predictive tools yet often lack valid inference for causal parameters, such as treatment effects or entire survival curves. While frameworks like Double Machine Learning (DML) and Targeted Maximum Likelihood Estimation (TMLE) can debias machine-learning fits, existing neural implementations either rely on "targeted losses" that do not guarantee solving the efficient influence function equation or computationally expensive post-hoc "fluctuations" for multi-parameter settings. We propose Targeted Deep Architectures (TDA), a new framework that embeds TMLE directly into the network's parameter space with no restrictions on the backbone architecture. Specifically, TDA partitions model parameters - freezing all but a small "targeting" subset - and iteratively updates them along a targeting gradient, derived from projecting the influence functions onto the span of the gradients of the loss with respect to weights. This procedure yields plug-in estimates that remove first-order bias and produce asymptotically valid confidence intervals. Crucially, TDA easily extends to multi-dimensional causal estimands (e.g., entire survival curves) by merging separate targeting gradients into a single universal targeting update. Theoretically, TDA inherits classical TMLE properties, including double robustness and semiparametric efficiency. Empirically, on the benchmark IHDP dataset (average treatment effects) and simulated survival data with informative censoring, TDA reduces bias and improves coverage relative to both standard neural-network estimators and prior post-hoc approaches. In doing so, TDA establishes a direct, scalable pathway toward rigorous causal inference within modern deep architectures for complex multi-parameter targets.
Publisher OA PDF DOI
Commentary on “Nonparametric identification is not enough, but randomized controlled trials are”: Statistical considerations for generating reliable evidence across a spectrum of studies that increasingly involve real-world elements
Observational Studies · 2025-03-01
articleOpen accessSenior author
Judea Pearl, quoted in Pearl and Mackenzie (2008), stated that "once we have understood why [randomized controlled trials] RCTs work, there is no need to put them on a pedestal and treat them as the gold standard of causal analysis, which all other methods should emulate." In Aronow et al. (2024), this claim is refuted, drawing on results of Robins and Ritov (1997). The argument is made that statistical estimation and inference tend to be fundamentally more difficult in observational studies than in randomized controlled trials, even when all confounders are observed and measured without error. We congratulate the authors for raising this highly timely, interesting discussion and welcome this opportunity to join this important debate. In this commentary, we focus on what it takes to generate reliable evidence across a spectrum of studies that increasingly involve real-world elements and less control over design. A related question is whether, along this spectrum of studies, the reliability of evidence generated by a statistical analysis decreases. We claim that this is not the case, but that the challenge for the appropriate statistical method increases, requiring sophisticated and careful execution.
Publisher OA PDF DOI

Frequent coauthors

Antoine Chambaz
Centre National de la Recherche Scientifique
86 shared
Nicholas P. Jewell
London School of Hygiene & Tropical Medicine
78 shared
James M. Robins
Harvard University
68 shared
Alex Luedtke
67 shared
Daniel B. Rubin
Massachusetts General Hospital
65 shared
Alessio Mi
Université Paris Cité
64 shared
Moulinath Banerjee
64 shared
Ian W. McKeague
Columbia University
64 shared

Awards & honors

COPSS Presidents' Award (2005)
Mortimer Spiegelman Award (2004)
van Dantzig Award (2005)

Resume-aware match score
Save to shortlist
AI-drafted outreach

See your match with Mark van der Laan

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

Join the waitlist How it works

Free to start
No credit card
30-second signup

Find professors who actually fit you