Ali Mesbah

Verified

University of California, Berkeley · Department of Chemical and Biomolecular Engineering

Active 2001–2026

h-index32

Citations4.4k

Papers190110 last 5y

Funding$1.5M

Faculty page Lab page

See your match with Ali Mesbah — sign in to PhdFit.Sign in

About

Ali Mesbah is an Associate Professor and the Principal Investigator at Mesbah Lab at UC Berkeley. His research focuses on learning-based analysis and predictive control of uncertain systems. The lab's work involves advanced modeling and control techniques, including reinforcement learning, model predictive control, safe learning-enabled control, Bayesian optimization, and data-driven inverse design. The research projects under his guidance span a variety of applications such as low-temperature plasmas for nanomaterial synthesis, atomic layer etching on superconducting surfaces for quantum computing, and non-Markovian dynamical modeling for molecular systems. Ali Mesbah leads a diverse team of postdoctoral researchers, graduate students, and visiting scholars, contributing to the advancement of control engineering and applied mathematics in complex and uncertain environments.

Research signals

Five dimensions sourced from public faculty / publication signals. Sign in to compare against your own profile and see your match score.

Research topics

Computer Science
Artificial Intelligence
Machine Learning
Nanotechnology
Physics
Engineering
Astrobiology
Biotechnology
Systems engineering
Biochemical engineering
Biology
Mathematics
Materials science

Selected publications

The Separation Principle and the Dual-Certainty Equivalence Gap in Model Predictive Control
arXiv (Cornell University) · 2026-04-07
articleOpen access
Dual control addresses the trade-off between exploitation and exploration, where control inputs both regulate the system and generate informative data for estimation and identification. For certain problem classes, control and estimation can be designed independently without loss of optimality, a property known as the separation principle. However, in stochastic control problems with model uncertainty and constraints, this principle generally breaks down, and introduces the need for dual control. In this paper, we propose an information-weighted dual model predictive control (MPC) formulation and introduce metrics that quantify the dependence of the MPC policy on the uncertainty. We focus on parametric uncertainty in linear systems with Gaussian noise, though the metrics can be applied more broadly. Numerical results show that the dependence of the MPC policy on the posterior covariance is largest under high uncertainty and vanishes as the posterior covariance contracts, providing empirical evidence of the dual effect in closed loop. Moreover, the dual controller improves regulation performance and model accuracy compared to certainty-equivalent MPC.
Publisher OA PDF
Soft MPCritic: Amortized Model Predictive Value Iteration
ArXiv.org · 2026-04-01
articleOpen accessSenior author
Reinforcement learning (RL) and model predictive control (MPC) offer complementary strengths, yet combining them at scale remains computationally challenging. We propose soft MPCritic, an RL-MPC framework that learns in (soft) value space while using sample-based planning for both online control and value target generation. soft MPCritic instantiates MPC through model predictive path integral control (MPPI) and trains a terminal Q-function with fitted value iteration, aligning the learned value function with the planner and implicitly extending the effective planning horizon. We introduce an amortized warm-start strategy that recycles planned open-loop action sequences from online observations when computing batched MPPI-based value targets. This makes soft MPCritic computationally practical, while preserving solution quality. soft MPCritic plans in a scenario-based fashion with an ensemble of dynamic models trained for next-step prediction accuracy. Together, these ingredients enable soft MPCritic to learn effectively through robust, short-horizon planning on classic and complex control tasks. These results establish soft MPCritic as a practical and scalable blueprint for synthesizing MPC policies in settings where policy extraction and direct, long-horizon planning may fail.
Publisher OA PDF
Soft MPCritic: Amortized Model Predictive Value Iteration
arXiv (Cornell University) · 2026-04-01
preprintOpen accessSenior author
Reinforcement learning (RL) and model predictive control (MPC) offer complementary strengths, yet combining them at scale remains computationally challenging. We propose soft MPCritic, an RL-MPC framework that learns in (soft) value space while using sample-based planning for both online control and value target generation. soft MPCritic instantiates MPC through model predictive path integral control (MPPI) and trains a terminal Q-function with fitted value iteration, aligning the learned value function with the planner and implicitly extending the effective planning horizon. We introduce an amortized warm-start strategy that recycles planned open-loop action sequences from online observations when computing batched MPPI-based value targets. This makes soft MPCritic computationally practical, while preserving solution quality. soft MPCritic plans in a scenario-based fashion with an ensemble of dynamic models trained for next-step prediction accuracy. Together, these ingredients enable soft MPCritic to learn effectively through robust, short-horizon planning on classic and complex control tasks. These results establish soft MPCritic as a practical and scalable blueprint for synthesizing MPC policies in settings where policy extraction and direct, long-horizon planning may fail.
Publisher DOI
The Separation Principle and the Dual-Certainty Equivalence Gap in Model Predictive Control
arXiv (Cornell University) · 2026-04-07
preprintOpen access
Dual control addresses the trade-off between exploitation and exploration, where control inputs both regulate the system and generate informative data for estimation and identification. For certain problem classes, control and estimation can be designed independently without loss of optimality, a property known as the separation principle. However, in stochastic control problems with model uncertainty and constraints, this principle generally breaks down, and introduces the need for dual control. In this paper, we propose an information-weighted dual model predictive control (MPC) formulation and introduce metrics that quantify the dependence of the MPC policy on the uncertainty. We focus on parametric uncertainty in linear systems with Gaussian noise, though the metrics can be applied more broadly. Numerical results show that the dependence of the MPC policy on the posterior covariance is largest under high uncertainty and vanishes as the posterior covariance contracts, providing empirical evidence of the dual effect in closed loop. Moreover, the dual controller improves regulation performance and model accuracy compared to certainty-equivalent MPC.
Publisher DOI
Machine learning-based investigation of the relationship between plasma emission spectra and biological responses <sup>*</sup>
Journal of Physics D Applied Physics · 2026-01-02
articleOpen access
Abstract Cold atmospheric plasma (CAP) has shown promising potential across biomedical applications. However, translating these effects into predictable and reproducible outcomes remains challenging due to device variability and the complex and unknown interplay of reactive species. This study evaluates the potential of machine learning (ML) to reliably predict CAP biological results including 24 h MTT and 48 h MTT from optical emission spectroscopy (OES) data. Data-driven models correlate plasma characteristics with cell viability and metabolic activity outcomes in human dermal fibroblasts. Diverse ML models were employed for their differing capabilities in feature extraction from OES data, essential for assessing the predictive capability of ML models for the biological effects of CAP from OES data. To evaluate cross plasma-device transferability without information leakage, models were tested on two distinct CAP jet systems. While the models achieved high accuracy for the primary jet used in training, their performance degraded considerably when applied to data from the secondary jet. To assess whether the loss of cross-device predictability could in principle be restored through minimal domain calibration, we conducted a controlled fine-tuning ablation on the pre-trained models. Results show that while spectra cannot serve as a device-independent predictor for the acute (24 h MTT) outcome, they retain partial transferability for the delayed (48 h MTT) response when minimal calibration is performed and sufficiently expressive models are employed to extract the underlying transferable structure. Moreover, the analysis identified the plasma source frequency as the most influential operational predictor, followed by voltage and treatment time. In identifying the gas-phase free radicals with the highest impact on biological outcomes, spectral fingerprints show the most influential species contributed in cell viability.
Publisher DOI
User Preference Meets Pareto-Optimality in Multi-Objective Bayesian Optimization
ArXiv.org · 2025-02-10
preprintOpen access
Incorporating user preferences into multi-objective Bayesian optimization (MOBO) allows for personalization of the optimization procedure. Preferences are often abstracted in the form of an unknown utility function, estimated through pairwise comparisons of potential outcomes. However, utility-driven MOBO methods can yield solutions that are dominated by nearby solutions, as non-dominance is not enforced. Additionally, classical MOBO commonly relies on estimating the entire Pareto-front to identify the Pareto-optimal solutions, which can be expensive and ignore user preferences. Here, we present a new method, termed preference-utility-balanced MOBO (PUB-MOBO), that allows users to disambiguate between near-Pareto candidate solutions. PUB-MOBO combines utility-based MOBO with local multi-gradient descent to refine user-preferred solutions to be near-Pareto-optimal. To this end, we propose a novel preference-dominated utility function that concurrently preserves user-preferences and dominance amongst candidate solutions. A key advantage of PUB-MOBO is that the local search is restricted to a (small) region of the Pareto-front directed by user preferences, alleviating the need to estimate the entire Pareto-front. PUB-MOBO is tested on three synthetic benchmark problems: DTLZ1, DTLZ2 and DH1, as well as on three real-world problems: Vehicle Safety, Conceptual Marine Design, and Car Side Impact. PUB-MOBO consistently outperforms state-of-the-art competitors in terms of proximity to the Pareto-front and utility regret across all the problems.
Publisher OA PDF DOI
A view on learning robust goal-conditioned value functions: Interplay between RL and MPC
Annual Reviews in Control · 2025-01-01 · 1 citations
articleSenior authorCorresponding
Publisher DOI
A neural master equation framework for multiscale modeling of molecular processes: application to atomic-scale plasma processes
npj Computational Materials · 2025-07-15 · 2 citations
articleOpen accessSenior author
Abstract Plasma-surface interactions (PSI) play a crucial role in microelectronics fabrication; however, their multiscale nature and array of complex, often unknown interactions make computational modeling of PSIs extremely difficult. To this end, we propose a general neural master equation (NME) framework that uses master equations to describe the dynamics of a molecular process, wherein neural networks learned from atomistic simulations represent unknown transitions between different system states. By leveraging the physics-based structure of master equations and data-driven state transitions, the NME framework promotes generalizability and physics interpretability, and can bridge disparate length and time scales. The framework is demonstrated for multiscale modeling of Si atomic layer etching and reactive ion etching, where the learned NME-based surface kinetic models exhibit good predictive and extrapolative capabilities for predicting experimentally relevant observables as a function of process parameters. The NME-based surface kinetic models obey physical constraints, which are violated in models based on neural ordinary differential equations. The proposed NME framework for multiscale modeling of molecular processes can pave the way for the discovery of new chemistries and materials in atomic-scale plasma processes.
Publisher OA PDF DOI
Model-free Reinforcement Learning for Model-based Control: Towards Safe, Interpretable and Sample-efficient Agents
ArXiv.org · 2025-07-17
preprintOpen accessSenior author
Training sophisticated agents for optimal decision-making under uncertainty has been key to the rapid development of modern autonomous systems across fields. Notably, model-free reinforcement learning (RL) has enabled decision-making agents to improve their performance directly through system interactions, with minimal prior knowledge about the system. Yet, model-free RL has generally relied on agents equipped with deep neural network function approximators, appealing to the networks' expressivity to capture the agent's policy and value function for complex systems. However, neural networks amplify the issues of sample inefficiency, unsafe learning, and limited interpretability in model-free RL. To this end, this work introduces model-based agents as a compelling alternative for control policy approximation, leveraging adaptable models of system dynamics, cost, and constraints for safe policy learning. These models can encode prior system knowledge to inform, constrain, and aid in explaining the agent's decisions, while deficiencies due to model mismatch can be remedied with model-free RL. We outline the benefits and challenges of learning model-based agents -- exemplified by model predictive control -- and detail the primary learning approaches: Bayesian optimization, policy search RL, and offline strategies, along with their respective strengths. While model-free RL has long been established, its interplay with model-based agents remains largely unexplored, motivating our perspective on their combined potentials for sample-efficient learning of safe and interpretable decision-making agents.
Publisher OA PDF DOI
Nitrogen Fixation Optimization Strategy Based on a Tree‐Structured Parzen Estimator‐Multilayer Perceptron (TPE‐MLP) Model
Plasma Processes and Polymers · 2025-06-29
article
ABSTRACT Atmospheric‐pressure non‐thermal plasma has shown great potential in the field of nitrogen fixation (NF). In this study, a Magnetic Field Stabilized Glow Discharge (MSGD) device is employed, utilizing Lorentz forces generated by an external magnetic field to stabilize the plasma channel. It enhances gas utilization and reduces energy cost compared to traditional gliding arc discharges. To address the challenge of plasma parameters involved, experimentally optimizing NF energy cost across numerous plasma parameters, a Multilayer Perceptron (MLP) model is trained to predict NF energy cost and NO X concentration and energy cost. Hyperparameters are optimized using the tree‐structured Parzen estimator (TPE), and gradient analysis is conducted to evaluate feature importance. The results demonstrate that this approach enables accurate prediction and offers an effective strategy for optimizing plasma‐based NF processes.
Publisher DOI

Recent grants

Collaborative Research: Learning-Based Scalable Predictive Control Strategies for Heterogeneous Traffic Networks
NSF · $277k · 2022–2024
Model predictive control under model structure uncertainty for stochastic systems
NSF · $300k · 2017–2021
Collaborative Research: Distributed Predictive Control of Cold Atmospheric Microplasma Jet Arrays for Materials Processing
NSF · $304k · 2019–2022
EAGER: Real-Time: Learning-based Optimal Control of Stochastic Nonlinear Systems
NSF · $226k · 2018–2021
Collaborative Research: Learning and Distributional Feedback Control for Fabrication of Advanced Materials
NSF · $354k · 2021–2025

Frequent coauthors

Joel A. Paulson
58 shared
Georgios Makrygiorgos
University of California, Berkeley
21 shared
Jared O’Leary
18 shared
Angelo D. Bonzanini
University of California, Berkeley
17 shared
Stefan Streif
Chemnitz University of Technology
17 shared
Richard D. Braatz
Massachusetts Institute of Technology
16 shared
David B. Graves
Princeton University
15 shared
Adam P. Arkin
University of California, Berkeley
13 shared

Labs

Mesbah Lab at UC BerkeleyPI

Education

PhD in Systems and Control
Delft University of Technology
Senior Postdoctoral Associate
Massachusetts Institute of Technology

Resume-aware match score
Save to shortlist
AI-drafted outreach

See your match with Ali Mesbah

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

Join the waitlist How it works

Free to start
No credit card
30-second signup

Find professors who actually fit you