
Navid Azizan
· Assistant ProfessorVerifiedMassachusetts Institute of Technology · Mechanical Engineering
Active 2015–2026
About
Navid Azizan is an Assistant Professor of Mechanical Engineering at the Massachusetts Institute of Technology (MIT). His research interests include AI, Machine Learning, and Intelligent Systems, with a focus on Control and Autonomy, Optimization, Energy, Ocean, and Bioengineering. He has a background that includes a Ph.D. from the California Institute of Technology (Caltech) obtained in 2021, a Master of Science from the University of Southern California in 2013, and a Bachelor of Science from Sharif University of Technology. Azizan has held positions such as a Postdoctoral Fellow at Stanford University and a Research Scientist Intern at Google DeepMind. His work involves developing advanced control systems and AI-enabled solutions, exemplified by recent innovations like enabling large language models to dynamically adjust their reasoning processes and improving autonomous drone control in uncertain environments. He has received numerous awards for excellence in teaching, research, and mentoring, including the NSF CAREER Award, the Joseph A. Martore Excellence in Teaching Award, and the Alfred H. and Jean M. Hayes Career Development Chair.
Research topics
- Artificial Intelligence
- Computer Science
- Physics
- Meteorology
Selected publications
IEEE Transactions on Pattern Analysis and Machine Intelligence · 2026-01-01
articleSenior authorDEtection TRansformer (DETR) and its variants have emerged as promising architectures for object detection, offering an end-to-end prediction pipeline. In practice, however, DETRs generate hundreds of predictions that far outnumber the actual objects present in an image. This raises a critical question: which of these predictions could be trusted? This is particularly important for safety-critical applications, such as in autonomous vehicles. Addressing this concern, we provide empirical and theoretical evidence that predictions within the same image play distinct roles, resulting in varying reliability levels. Our analysis reveals that DETRs employ an optimal “specialist strategy”: one prediction per object is trained to be well-calibrated , while the remaining predictions are trained to suppress their foreground confidence to near zero, even when maintaining accurate localization. We show that this strategy emerges as the loss-minimizing solution to the Hungarian matching algorithm, fundamentally shaping DETRs' outputs. While selecting the well-calibrated predictions is ideal, they are unidentifiable at inference time. This means that any post-processing algorithm—used to identify trustworthy predictions—poses a risk of outputting a set of predictions with mixed calibration levels. Therefore, practical deployment necessitates a joint evaluation of both the model's calibration quality and the effectiveness of the post-processing algorithm. However, we demonstrate that existing metrics like average precision and expected calibration error are inadequate for this task. To address this issue, we further introduce Object-level Calibration Error (OCE), which evaluates calibration by aggregating predictions per ground-truth object rather than per prediction. This object-centric design penalizes both retaining suppressed predictions and missed ground truth foreground objects, making OCE suitable for both evaluating models and identifying reliable prediction subsets. Finally, we present a post hoc uncertainty quantification (UQ) framework that predicts per-image model accuracy.
ATOM-CBF: Adaptive Safe Perception-Based Control under Out-of-Distribution Measurements
ArXiv.org · 2025-11-11
preprintOpen accessSenior authorEnsuring the safety of real-world systems is challenging, especially when they rely on learned perception modules to infer the system state from high-dimensional sensor data. These perception modules are vulnerable to epistemic uncertainty, often failing when encountering out-of-distribution (OoD) measurements not seen during training. To address this gap, we introduce ATOM-CBF (Adaptive-To-OoD-Measurement Control Barrier Function), a novel safe control framework that explicitly computes and adapts to the epistemic uncertainty from OoD measurements, without the need for ground-truth labels or information on distribution shifts. Our approach features two key components: (1) an OoD-aware adaptive perception error margin and (2) a safety filter that integrates this adaptive error margin, enabling the filter to adjust its conservatism in real-time. We provide empirical validation in simulations, demonstrating that ATOM-CBF maintains safety for an F1Tenth vehicle with LiDAR scans and a quadruped robot with RGB images.
On the Role of Transformer Feed-Forward Layers in Nonlinear In-Context Learning
ArXiv.org · 2025-01-30
preprintOpen accessSenior authorTransformer-based models demonstrate a remarkable ability for in-context learning (ICL), where they can adapt to unseen tasks from a few prompt examples without parameter updates. Recent research has illuminated how Transformers perform ICL, showing that the optimal linear self-attention (LSA) mechanism can implement one step of gradient descent for linear least-squares objectives when trained on random linear regression tasks. Building on this, we investigate ICL for nonlinear function classes. We first prove that LSA is inherently incapable of outperforming linear predictors on nonlinear tasks, underscoring why prior solutions cannot readily extend to these problems. To overcome this limitation, we analyze a Transformer block consisting of LSA and feed-forward layers inspired by the gated linear units (GLU), which is a standard component of modern Transformers. We show that this block achieves nonlinear ICL by implementing one step of gradient descent on a polynomial kernel regression loss. Furthermore, our analysis reveals that the expressivity of a single block is inherently limited by its dimensions. We then show that a deep Transformer can overcome this bottleneck by distributing the computation of richer kernel functions across multiple blocks, performing block-coordinate descent in a high-dimensional feature space that a single block cannot represent. Our findings highlight that the feed-forward layers provide a crucial and scalable mechanism by which Transformers can express nonlinear representations for ICL.
Probabilistic Forecasting for Building Energy Systems using Time-Series Foundation Models
ArXiv.org · 2025-05-31 · 1 citations
preprintOpen accessDecision-making in building energy systems critically depends on the predictive accuracy of relevant time-series models. In scenarios lacking extensive data from a target building, foundation models (FMs) represent a promising technology that can leverage prior knowledge from vast and diverse pre-training datasets to construct accurate probabilistic predictors for use in decision-making tools. This paper investigates the applicability and fine-tuning strategies of time-series foundation models (TSFMs) in building energy forecasting. We analyze both full fine-tuning and parameter-efficient fine-tuning approaches, particularly low-rank adaptation (LoRA), by using real-world data from a commercial net-zero energy building to capture signals such as room occupancy, carbon emissions, plug loads, and HVAC energy consumption. Our analysis reveals that the zero-shot predictive performance of TSFMs is generally suboptimal. To address this shortcoming, we demonstrate that employing either full fine-tuning or parameter-efficient fine-tuning significantly enhances forecasting accuracy, even with limited historical data. Notably, fine-tuning with low-rank adaptation (LoRA) substantially reduces computational costs without sacrificing accuracy. Furthermore, fine-tuned TSFMs consistently outperform state-of-the-art deep forecasting models (e.g., temporal fusion transformers) in accuracy, robustness, and generalization across varying building zones and seasonal conditions. These results underline the efficacy of TSFMs for practical, data-constrained building energy management systems, enabling improved decision-making in pursuit of energy efficiency and sustainability.
Probabilistic forecasting for building energy systems using time-series foundation models
Energy and Buildings · 2025-09-22 · 3 citations
articleActivation-Informed Merging of Large Language Models
ArXiv.org · 2025-02-04
preprintOpen accessSenior authorModel merging, a method that combines the parameters and embeddings of multiple fine-tuned large language models (LLMs), offers a promising approach to enhance model performance across various tasks while maintaining computational efficiency. This paper introduces Activation-Informed Merging (AIM), a technique that integrates the information from the activation space of LLMs into the merging process to improve performance and robustness. AIM is designed as a flexible, complementary solution that is applicable to any existing merging method. It aims to preserve critical weights from the base model, drawing on principles from continual learning (CL) and model compression. Utilizing a task-agnostic calibration set, AIM selectively prioritizes essential weights during merging. We empirically demonstrate that AIM significantly enhances the performance of merged models across multiple benchmarks. Our findings suggest that considering the activation-space information can provide substantial advancements in the model merging strategies for LLMs, with up to a 40% increase in benchmark performance.
Personalized Collaborative Learning with Affinity-Based Variance Reduction
arXiv (Cornell University) · 2025-10-17
preprintOpen accessSenior authorMulti-agent learning faces a fundamental tension: leveraging distributed collaboration without sacrificing the personalization needed for diverse agents. This tension intensifies when aiming for full personalization while adapting to unknown heterogeneity levels -- gaining collaborative speedup when agents are similar, without performance degradation when they are different. Embracing the challenge, we propose personalized collaborative learning (PCL), a novel framework for heterogeneous agents to collaboratively learn personalized solutions with seamless adaptivity. Through carefully designed bias correction and importance correction mechanisms, our method AffPCL robustly handles both environment and objective heterogeneity. We prove that AffPCL reduces sample complexity over independent learning by a factor of $\max\{n^{-1}, δ\}$, where $n$ is the number of agents and $δ\in[0,1]$ measures their heterogeneity. This affinity-based acceleration automatically interpolates between the linear speedup of federated learning in homogeneous settings and the baseline of independent learning, without requiring prior knowledge of the system. Our analysis further reveals that an agent may obtain linear speedup even by collaborating with arbitrarily dissimilar agents, unveiling new insights into personalization and collaboration in the high heterogeneity regime.
HardFlow: Hard-Constrained Sampling for Flow-Matching Models via Trajectory Optimization
ArXiv.org · 2025-11-11
preprintOpen accessSenior authorDiffusion and flow-matching have emerged as powerful methodologies for generative modeling, with remarkable success in capturing complex data distributions and enabling flexible guidance at inference time. Many downstream applications, however, demand enforcing hard constraints on generated samples (for example, robot trajectories must avoid obstacles), a requirement that goes beyond simple guidance. Prevailing projection-based approaches constrain the entire sampling path to the constraint manifold, which is overly restrictive and degrades sample quality. In this paper, we introduce a novel framework that reformulates hard-constrained sampling as a trajectory optimization problem. Our key insight is to leverage numerical optimal control to steer the sampling trajectory so that constraints are satisfied precisely at the terminal time. By exploiting the underlying structure of flow-matching models and adopting techniques from model predictive control, we transform this otherwise complex constrained optimization problem into a tractable surrogate that can be solved efficiently and effectively. Furthermore, this trajectory optimization perspective offers significant flexibility beyond mere constraint satisfaction, allowing for the inclusion of integral costs to minimize distribution shift and terminal objectives to further enhance sample quality, all within a unified framework. We provide a control-theoretic analysis of our method, establishing bounds on the approximation error between our tractable surrogate and the ideal formulation. Extensive experiments across diverse domains, including robotics (planning), partial differential equations (boundary control), and vision (text-guided image editing), demonstrate that our algorithm, which we name $\textit{HardFlow}$, substantially outperforms existing methods in both constraint satisfaction and sample quality.
Safe Multi-Agent Reinforcement Learning with Convergence to Generalized Nash Equilibrium
arXiv (Cornell University) · 2024-11-22
preprintOpen accessSenior authorMulti-agent reinforcement learning (MARL) has achieved notable success in cooperative tasks, demonstrating impressive performance and scalability. However, deploying MARL agents in real-world applications presents critical safety challenges. Current safe MARL algorithms are largely based on the constrained Markov decision process (CMDP) framework, which enforces constraints only on discounted cumulative costs and lacks an all-time safety assurance. Moreover, these methods often overlook the feasibility issue (the system will inevitably violate state constraints within certain regions of the constraint set), resulting in either suboptimal performance or increased constraint violations. To address these challenges, we propose a novel theoretical framework for safe MARL with $\textit{state-wise}$ constraints, where safety requirements are enforced at every state the agents visit. To resolve the feasibility issue, we leverage a control-theoretic notion of the feasible region, the controlled invariant set (CIS), characterized by the safety value function. We develop a multi-agent method for identifying CISs, ensuring convergence to a Nash equilibrium on the safety value function. By incorporating CIS identification into the learning process, we introduce a multi-agent dual policy iteration algorithm that guarantees convergence to a generalized Nash equilibrium in state-wise constrained cooperative Markov games, achieving an optimal balance between feasibility and performance. Furthermore, for practical deployment in complex high-dimensional systems, we propose $\textit{Multi-Agent Dual Actor-Critic}$ (MADAC), a safe MARL algorithm that approximates the proposed iteration scheme within the deep RL paradigm. Empirical evaluations on safe MARL benchmarks demonstrate that MADAC consistently outperforms existing methods, delivering much higher rewards while reducing constraint violations.
HardNet: Hard-Constrained Neural Networks with Universal Approximation Guarantees
arXiv (Cornell University) · 2024-10-14 · 3 citations
preprintOpen accessSenior authorIncorporating prior knowledge or specifications of input-output relationships into machine learning models has attracted significant attention, as it enhances generalization from limited data and yields conforming outputs. However, most existing approaches use soft constraints by penalizing violations through regularization, which offers no guarantee of constraint satisfaction, especially on inputs far from the training distribution--an essential requirement in safety-critical applications. On the other hand, imposing hard constraints on neural networks may hinder their representational power, adversely affecting performance. To address this, we propose HardNet, a practical framework for constructing neural networks that inherently satisfy hard constraints without sacrificing model capacity. Unlike approaches that modify outputs only at inference time, HardNet enables end-to-end training with hard constraint guarantees, leading to improved performance. To the best of our knowledge, HardNet is the first method that enables efficient and differentiable enforcement of more than one input-dependent inequality constraint. It allows unconstrained optimization of the network parameters using standard algorithms by appending a differentiable closed-form enforcement layer to the network's output. Furthermore, we show that HardNet retains neural networks' universal approximation capabilities. We demonstrate its versatility and effectiveness across various applications: learning with piecewise constraints, learning optimization solvers with guaranteed feasibility, and optimizing control policies in safety-critical systems.
Frequent coauthors
- 21 shared
Babak Hassibi
- 16 shared
Young-Jae Min
- 12 shared
Marco Pavone
Nvidia (United States)
- 11 shared
Spencer M. Richards
- 8 shared
Krishnamurthy Dvijotham
- 7 shared
Jean-Jacques Slotine
- 7 shared
Kwangjun Ahn
- 7 shared
Adam Wierman
California Institute of Technology
Labs
Not provided
Education
- 2020
Ph.D., Computing and Mathematical Sciences Department
California Institute of Technology
Awards & honors
- Joseph A. Martore (1975) Excellence in Teaching Award (2026)
- National Science Foundation CAREER Award (2026)
- INFORMS JFIG Paper Competition Finalist (2025)
- Alfred H. (1929) and Jean M. Hayes Career Development Chair…
- Frank E. Perkins Award for Excellence in Graduate Advising (…
- Resume-aware match score
- Save to shortlist
- AI-drafted outreach
See your match with Navid Azizan
PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.
- Free to start
- No credit card
- 30-second signup