
Ali Mesbah
VerifiedUniversity of California, Berkeley · Department of Chemical and Biomolecular Engineering
Active 2001–2026
About
Ali Mesbah is an Associate Professor and the Principal Investigator at Mesbah Lab at UC Berkeley. His research focuses on learning-based analysis and predictive control of uncertain systems. The lab's work involves advanced modeling and control techniques, including reinforcement learning, model predictive control, safe learning-enabled control, Bayesian optimization, and data-driven inverse design. The research projects under his guidance span a variety of applications such as low-temperature plasmas for nanomaterial synthesis, atomic layer etching on superconducting surfaces for quantum computing, and non-Markovian dynamical modeling for molecular systems. Ali Mesbah leads a diverse team of postdoctoral researchers, graduate students, and visiting scholars, contributing to the advancement of control engineering and applied mathematics in complex and uncertain environments.
Research signals
Five dimensions sourced from public faculty / publication signals. Sign in to compare against your own profile and see your match score.
Research topics
- Computer Science
- Artificial Intelligence
- Machine Learning
- Nanotechnology
- Physics
- Engineering
- Astrobiology
- Biotechnology
- Systems engineering
- Biochemical engineering
- Biology
- Mathematics
- Materials science
Selected publications
The Separation Principle and the Dual-Certainty Equivalence Gap in Model Predictive Control
arXiv (Cornell University) · 2026-04-07
articleOpen accessDual control addresses the trade-off between exploitation and exploration, where control inputs both regulate the system and generate informative data for estimation and identification. For certain problem classes, control and estimation can be designed independently without loss of optimality, a property known as the separation principle. However, in stochastic control problems with model uncertainty and constraints, this principle generally breaks down, and introduces the need for dual control. In this paper, we propose an information-weighted dual model predictive control (MPC) formulation and introduce metrics that quantify the dependence of the MPC policy on the uncertainty. We focus on parametric uncertainty in linear systems with Gaussian noise, though the metrics can be applied more broadly. Numerical results show that the dependence of the MPC policy on the posterior covariance is largest under high uncertainty and vanishes as the posterior covariance contracts, providing empirical evidence of the dual effect in closed loop. Moreover, the dual controller improves regulation performance and model accuracy compared to certainty-equivalent MPC.
Soft MPCritic: Amortized Model Predictive Value Iteration
ArXiv.org · 2026-04-01
articleOpen accessSenior authorReinforcement learning (RL) and model predictive control (MPC) offer complementary strengths, yet combining them at scale remains computationally challenging. We propose soft MPCritic, an RL-MPC framework that learns in (soft) value space while using sample-based planning for both online control and value target generation. soft MPCritic instantiates MPC through model predictive path integral control (MPPI) and trains a terminal Q-function with fitted value iteration, aligning the learned value function with the planner and implicitly extending the effective planning horizon. We introduce an amortized warm-start strategy that recycles planned open-loop action sequences from online observations when computing batched MPPI-based value targets. This makes soft MPCritic computationally practical, while preserving solution quality. soft MPCritic plans in a scenario-based fashion with an ensemble of dynamic models trained for next-step prediction accuracy. Together, these ingredients enable soft MPCritic to learn effectively through robust, short-horizon planning on classic and complex control tasks. These results establish soft MPCritic as a practical and scalable blueprint for synthesizing MPC policies in settings where policy extraction and direct, long-horizon planning may fail.
Soft MPCritic: Amortized Model Predictive Value Iteration
arXiv (Cornell University) · 2026-04-01
preprintOpen accessSenior authorReinforcement learning (RL) and model predictive control (MPC) offer complementary strengths, yet combining them at scale remains computationally challenging. We propose soft MPCritic, an RL-MPC framework that learns in (soft) value space while using sample-based planning for both online control and value target generation. soft MPCritic instantiates MPC through model predictive path integral control (MPPI) and trains a terminal Q-function with fitted value iteration, aligning the learned value function with the planner and implicitly extending the effective planning horizon. We introduce an amortized warm-start strategy that recycles planned open-loop action sequences from online observations when computing batched MPPI-based value targets. This makes soft MPCritic computationally practical, while preserving solution quality. soft MPCritic plans in a scenario-based fashion with an ensemble of dynamic models trained for next-step prediction accuracy. Together, these ingredients enable soft MPCritic to learn effectively through robust, short-horizon planning on classic and complex control tasks. These results establish soft MPCritic as a practical and scalable blueprint for synthesizing MPC policies in settings where policy extraction and direct, long-horizon planning may fail.
The Separation Principle and the Dual-Certainty Equivalence Gap in Model Predictive Control
arXiv (Cornell University) · 2026-04-07
preprintOpen accessDual control addresses the trade-off between exploitation and exploration, where control inputs both regulate the system and generate informative data for estimation and identification. For certain problem classes, control and estimation can be designed independently without loss of optimality, a property known as the separation principle. However, in stochastic control problems with model uncertainty and constraints, this principle generally breaks down, and introduces the need for dual control. In this paper, we propose an information-weighted dual model predictive control (MPC) formulation and introduce metrics that quantify the dependence of the MPC policy on the uncertainty. We focus on parametric uncertainty in linear systems with Gaussian noise, though the metrics can be applied more broadly. Numerical results show that the dependence of the MPC policy on the posterior covariance is largest under high uncertainty and vanishes as the posterior covariance contracts, providing empirical evidence of the dual effect in closed loop. Moreover, the dual controller improves regulation performance and model accuracy compared to certainty-equivalent MPC.
Journal of Physics D Applied Physics · 2026-01-02
articleOpen accessAbstract Cold atmospheric plasma (CAP) has shown promising potential across biomedical applications. However, translating these effects into predictable and reproducible outcomes remains challenging due to device variability and the complex and unknown interplay of reactive species. This study evaluates the potential of machine learning (ML) to reliably predict CAP biological results including 24 h MTT and 48 h MTT from optical emission spectroscopy (OES) data. Data-driven models correlate plasma characteristics with cell viability and metabolic activity outcomes in human dermal fibroblasts. Diverse ML models were employed for their differing capabilities in feature extraction from OES data, essential for assessing the predictive capability of ML models for the biological effects of CAP from OES data. To evaluate cross plasma-device transferability without information leakage, models were tested on two distinct CAP jet systems. While the models achieved high accuracy for the primary jet used in training, their performance degraded considerably when applied to data from the secondary jet. To assess whether the loss of cross-device predictability could in principle be restored through minimal domain calibration, we conducted a controlled fine-tuning ablation on the pre-trained models. Results show that while spectra cannot serve as a device-independent predictor for the acute (24 h MTT) outcome, they retain partial transferability for the delayed (48 h MTT) response when minimal calibration is performed and sufficiently expressive models are employed to extract the underlying transferable structure. Moreover, the analysis identified the plasma source frequency as the most influential operational predictor, followed by voltage and treatment time. In identifying the gas-phase free radicals with the highest impact on biological outcomes, spectral fingerprints show the most influential species contributed in cell viability.
User Preference Meets Pareto-Optimality in Multi-Objective Bayesian Optimization
ArXiv.org · 2025-02-10
preprintOpen accessIncorporating user preferences into multi-objective Bayesian optimization (MOBO) allows for personalization of the optimization procedure. Preferences are often abstracted in the form of an unknown utility function, estimated through pairwise comparisons of potential outcomes. However, utility-driven MOBO methods can yield solutions that are dominated by nearby solutions, as non-dominance is not enforced. Additionally, classical MOBO commonly relies on estimating the entire Pareto-front to identify the Pareto-optimal solutions, which can be expensive and ignore user preferences. Here, we present a new method, termed preference-utility-balanced MOBO (PUB-MOBO), that allows users to disambiguate between near-Pareto candidate solutions. PUB-MOBO combines utility-based MOBO with local multi-gradient descent to refine user-preferred solutions to be near-Pareto-optimal. To this end, we propose a novel preference-dominated utility function that concurrently preserves user-preferences and dominance amongst candidate solutions. A key advantage of PUB-MOBO is that the local search is restricted to a (small) region of the Pareto-front directed by user preferences, alleviating the need to estimate the entire Pareto-front. PUB-MOBO is tested on three synthetic benchmark problems: DTLZ1, DTLZ2 and DH1, as well as on three real-world problems: Vehicle Safety, Conceptual Marine Design, and Car Side Impact. PUB-MOBO consistently outperforms state-of-the-art competitors in terms of proximity to the Pareto-front and utility regret across all the problems.
A view on learning robust goal-conditioned value functions: Interplay between RL and MPC
Annual Reviews in Control · 2025-01-01 · 1 citations
articleSenior authorCorrespondingnpj Computational Materials · 2025-07-15 · 2 citations
articleOpen accessSenior authorAbstract Plasma-surface interactions (PSI) play a crucial role in microelectronics fabrication; however, their multiscale nature and array of complex, often unknown interactions make computational modeling of PSIs extremely difficult. To this end, we propose a general neural master equation (NME) framework that uses master equations to describe the dynamics of a molecular process, wherein neural networks learned from atomistic simulations represent unknown transitions between different system states. By leveraging the physics-based structure of master equations and data-driven state transitions, the NME framework promotes generalizability and physics interpretability, and can bridge disparate length and time scales. The framework is demonstrated for multiscale modeling of Si atomic layer etching and reactive ion etching, where the learned NME-based surface kinetic models exhibit good predictive and extrapolative capabilities for predicting experimentally relevant observables as a function of process parameters. The NME-based surface kinetic models obey physical constraints, which are violated in models based on neural ordinary differential equations. The proposed NME framework for multiscale modeling of molecular processes can pave the way for the discovery of new chemistries and materials in atomic-scale plasma processes.
ArXiv.org · 2025-07-17
preprintOpen accessSenior authorTraining sophisticated agents for optimal decision-making under uncertainty has been key to the rapid development of modern autonomous systems across fields. Notably, model-free reinforcement learning (RL) has enabled decision-making agents to improve their performance directly through system interactions, with minimal prior knowledge about the system. Yet, model-free RL has generally relied on agents equipped with deep neural network function approximators, appealing to the networks' expressivity to capture the agent's policy and value function for complex systems. However, neural networks amplify the issues of sample inefficiency, unsafe learning, and limited interpretability in model-free RL. To this end, this work introduces model-based agents as a compelling alternative for control policy approximation, leveraging adaptable models of system dynamics, cost, and constraints for safe policy learning. These models can encode prior system knowledge to inform, constrain, and aid in explaining the agent's decisions, while deficiencies due to model mismatch can be remedied with model-free RL. We outline the benefits and challenges of learning model-based agents -- exemplified by model predictive control -- and detail the primary learning approaches: Bayesian optimization, policy search RL, and offline strategies, along with their respective strengths. While model-free RL has long been established, its interplay with model-based agents remains largely unexplored, motivating our perspective on their combined potentials for sample-efficient learning of safe and interpretable decision-making agents.
Plasma Processes and Polymers · 2025-06-29
articleABSTRACT Atmospheric‐pressure non‐thermal plasma has shown great potential in the field of nitrogen fixation (NF). In this study, a Magnetic Field Stabilized Glow Discharge (MSGD) device is employed, utilizing Lorentz forces generated by an external magnetic field to stabilize the plasma channel. It enhances gas utilization and reduces energy cost compared to traditional gliding arc discharges. To address the challenge of plasma parameters involved, experimentally optimizing NF energy cost across numerous plasma parameters, a Multilayer Perceptron (MLP) model is trained to predict NF energy cost and NO X concentration and energy cost. Hyperparameters are optimized using the tree‐structured Parzen estimator (TPE), and gradient analysis is conducted to evaluate feature importance. The results demonstrate that this approach enables accurate prediction and offers an effective strategy for optimizing plasma‐based NF processes.
Recent grants
NSF · $277k · 2022–2024
Model predictive control under model structure uncertainty for stochastic systems
NSF · $300k · 2017–2021
NSF · $304k · 2019–2022
EAGER: Real-Time: Learning-based Optimal Control of Stochastic Nonlinear Systems
NSF · $226k · 2018–2021
NSF · $354k · 2021–2025
Frequent coauthors
- 58 shared
Joel A. Paulson
- 21 shared
Georgios Makrygiorgos
University of California, Berkeley
- 18 shared
Jared O’Leary
- 17 shared
Angelo D. Bonzanini
University of California, Berkeley
- 17 shared
Stefan Streif
Chemnitz University of Technology
- 16 shared
Richard D. Braatz
Massachusetts Institute of Technology
- 15 shared
David B. Graves
Princeton University
- 13 shared
Adam P. Arkin
University of California, Berkeley
Labs
Education
PhD in Systems and Control
Delft University of Technology
Senior Postdoctoral Associate
Massachusetts Institute of Technology
- Resume-aware match score
- Save to shortlist
- AI-drafted outreach
See your match with Ali Mesbah
PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.
- Free to start
- No credit card
- 30-second signup