
Aditya Grover
· ProfessorVerifiedUniversity of California, Los Angeles · Computer Science
Active 2002–2026
About
Aditya Grover is an Assistant Professor of Computer Science at UCLA Samueli School of Engineering. His research focuses on probabilistic machine learning for unsupervised representation learning and sequential decision making. He has contributed to advancements in neural information processing, including divergence-based generative modeling on manifolds. Grover has received numerous awards for his work, including the Forbes 30 Under 30 in Science (2023), the AI2050 Early Career Fellowship (2024), and the Samsung AI Researcher of the Year Award (2022). His work has been recognized in the media and he has been acknowledged for his outstanding contributions to the field of machine learning and AI.
Research signals
Five dimensions sourced from public faculty / publication signals. Sign in to compare against your own profile and see your match score.
Research topics
- Computer Science
- Machine Learning
- Artificial Intelligence
- Engineering
- Meteorology
- Ecology
- Electrical engineering
- Geography
- Mathematics
- Reliability engineering
Selected publications
Reassessing the Scaling of AI-Powered Climate Models Against Dynamical Counterparts
2026-03-14
articleOpen accessAre AI-powered climate models intrinsically more efficient than traditional climate models?While progress is still needed before they become operational, hybrid AI-physics climate models and AI emulators of climate models have the potential to sharply reduce inference cost relative to traditional CPU-based models, allowing larger ensembles to explore different scenarios and sharpen uncertainty estimation. Yet this apparent efficiency becomes less obvious when the comparison includes GPU-ported dynamical climate models, and when efficiency is assessed against the effective complexity of the simulated climate system.As a first step, recognizing that a perfect apple-to-apple comparison is rarely possible from reported configurations, we synthesize reported performance for leading AI climate model emulators (e.g., ACE2, CAMulator), hybrid AI-physics models (e.g., CliMA, NeuralGCM), and GPU-accelerated traditional models (e.g., SCREAM, ICON). We examine two complementary scaling views. The first compares throughput (simulated years per day) per accelerator (GPUs or TPUs) and per prognostic variable, as a function of horizontal grid spacing. The second compares the same normalized throughput against an effective complexity proxy, defined as the number of vertical levels divided by the product of the time step and the squared horizontal grid spacing, to account for the simulated vertical structure and, importantly, time-step constraints imposed by numerical stability.We find that AI-powered models can show favorable apparent scaling with horizontal resolution in raw throughput, but that the advantage becomes modest once effective complexity is accounted for: at comparable complexity, AI climate models do not appear intrinsically more efficient than GPU-ported dynamical models. Hybrid approaches occupy a distinct middle ground: their acceleration and added value come primarily from learned parameterizations that improve the representation of unresolved processes while the overall model retains a physically-based dynamical core, including explicit conservation laws. AI climate model emulators, by contrast, offer their clearest computational advantage through task-targeted prediction, where a limited set of climate-relevant variables can be directly simulated on the grid of interest. This avoids integrating the full high-frequency, multivariate state at the short time step traditionally required for numerical stability, which is especially advantageous when emulating a fine-resolution reference model with a coarser emulator. Diverse downscaling or targeted post-processing strategies can further substitute for explicit fine-scale resolution when observations are available, enabling inexpensive local or hazard-specific risk assessment at decadal to multi-decadal time horizons.
Iceberg: Enhancing HLS Modeling with Synthetic Data
2025-06-26 · 1 citations
articleDeep learning-based prediction models for High-Level Synthesis (HLS) of hardware designs often struggle to generalize. In this paper, we study how to close the generalizability gap of these models through pretraining on synthetic data and introduce Iceberg, a synthetic data augmentation approach that expands both large language model (LLM)-generated programs and weak labels of unseen design configurations. Our weak label generation method is integrated with an in-context model architecture, enabling meta-learning from actual and proximate labels. Iceberg improves the geometric mean modeling accuracy by 86.4% when adapt to six real-world applications with few-shot examples and achieves a 2.47× and a 1.12× better offline DSE performance when adapting to two different test datasets. Our open-sourced code is here: https://github.com/UCLA-VAST/iceberg.
IndiaWeatherBench: A Dataset and Benchmark for Data-Driven Regional Weather Forecasting over India
ArXiv.org · 2025-08-31
preprintOpen accessSenior authorRegional weather forecasting is a critical problem for localized climate adaptation, disaster mitigation, and sustainable development. While machine learning has shown impressive progress in global weather forecasting, regional forecasting remains comparatively underexplored. Existing efforts often use different datasets and experimental setups, limiting fair comparison and reproducibility. We introduce IndiaWeatherBench, a comprehensive benchmark for data-driven regional weather forecasting focused on the Indian subcontinent. IndiaWeatherBench provides a curated dataset built from high-resolution regional reanalysis products, along with a suite of deterministic and probabilistic metrics to facilitate consistent training and evaluation. To establish strong baselines, we implement and evaluate a range of models across diverse architectures, including UNets, Transformers, and Graph-based networks, as well as different boundary conditioning strategies and training objectives. While focused on India, IndiaWeatherBench is easily extensible to other geographic regions. We open-source all raw and preprocessed datasets, model implementations, and evaluation pipelines to promote accessibility and future development. We hope IndiaWeatherBench will serve as a foundation for advancing regional weather forecasting research. Code is available at https://github.com/tung-nd/IndiaWeatherBench.
2025-01-01 · 1 citations
articleOpen accessSenior authorA common technique for aligning large language models (LLMs) relies on acquiring human preferences by comparing multiple generations conditioned on a fixed context.This method, however, relies solely on pairwise comparisons, where the generations are evaluated within an identical context.While effective to such conditional preferences often fail to encompass the nuanced and multidimensional nature of human preferences.In this work, we revisit the traditional paradigm of preference acquisition and propose a new axis based on eliciting preferences jointly over the instructionresponse pairs.Unlike prior preference optimizations, which are designed for conditional ranking protocols (e.g., DPO), we propose Joint Preference Optimization (JPO), a new preference optimization objective that upweights the joint probability of the chosen instructionresponse pair over the rejected instructionresponse pair.Interestingly, LLMs trained with joint instruction-response preference data using JPO outperform LLM trained with DPO by 5.2% and 3.3% win-rate for summarization and open-ended dialogue datasets, respectively.Our findings reveal that joint preferences over instruction and response pairs can significantly enhance the alignment of LLMs by tapping into a broader spectrum of human preference elicitation.The data and code is available at https://github.com/Hritikbansal/jpo.Create a list of four fruits other than Apple Create a list of four fruits Apple, Orange, Banana, Grape Orange, Blueberry, Kiwi, Banana Apple, Orange, Banana, Grape Create a list of beach activities wear sunscreen, don't litter
MobileWorldBench: Towards Semantic World Modeling For Mobile Agents
ArXiv.org · 2025-12-16
preprintOpen accessSenior authorWorld models have shown great utility in improving the task performance of embodied agents. While prior work largely focuses on pixel-space world models, these approaches face practical limitations in GUI settings, where predicting complex visual elements in future states is often difficult. In this work, we explore an alternative formulation of world modeling for GUI agents, where state transitions are described in natural language rather than predicting raw pixels. First, we introduce MobileWorldBench, a benchmark that evaluates the ability of vision-language models (VLMs) to function as world models for mobile GUI agents. Second, we release MobileWorld, a large-scale dataset consisting of 1.4M samples, that significantly improves the world modeling capabilities of VLMs. Finally, we propose a novel framework that integrates VLM world models into the planning framework of mobile agents, demonstrating that semantic world models can directly benefit mobile agents by improving task success rates. The code and dataset is available at https://github.com/jacklishufan/MobileWorld
Enabling Autoregressive Models to Fill In Masked Tokens
ArXiv.org · 2025-02-09
preprintOpen accessHistorically, LLMs have been trained using either autoregressive (AR) or masked language modeling (MLM) objectives, with AR models gaining dominance in recent years. However, AR models are inherently incapable of masked infilling, which is the ability to predict masked tokens between past and future context. In contrast, MLM models suffer from intrinsic computational inefficiencies during both training and inference that hinder their scalability. This work introduces MARIA (Masked and Autoregressive Infilling Architecture), a novel approach that leverages the strengths of both paradigms to achieve state-of-the-art masked infilling performance. MARIA combines a pre-trained MLM and AR model by training a linear decoder that takes their concatenated hidden states as input. This minimal modification enables the AR model to perform infilling while retaining its inherent advantages in terms of faster inference with KV caching. Our results demonstrate that MARIA significantly outperforms existing methods, namely discrete diffusion models, on masked infilling tasks.
d1: Scaling Reasoning in Diffusion Large Language Models via Reinforcement Learning
ArXiv.org · 2025-04-16
preprintOpen accessSenior authorRecent large language models (LLMs) have demonstrated strong reasoning capabilities that benefits from online reinforcement learning (RL). These capabilities have primarily been demonstrated within the left-to-right autoregressive (AR) generation paradigm. In contrast, non-autoregressive paradigms based on diffusion generate text in a coarse-to-fine manner. Although recent diffusion-based large language models (dLLMs) have achieved competitive language modeling performance compared to their AR counterparts, it remains unclear if dLLMs can also leverage recent advances in LLM reasoning. To this end, we propose d1, a framework to adapt pre-trained masked dLLMs into reasoning models via a combination of supervised finetuning (SFT) and RL. Specifically, we develop and extend techniques to improve reasoning in pretrained dLLMs: (a) we utilize a masked SFT technique to distill knowledge and instill self-improvement behavior directly from existing datasets, and (b) we introduce a novel critic-free, policy-gradient based RL algorithm called diffu-GRPO, the first integration of policy gradient methods to masked dLLMs. Through empirical studies, we investigate the performance of different post-training recipes on multiple mathematical and planning benchmarks. We find that d1 yields the best performance and significantly improves performance of a state-of-the-art dLLM. Our code is released at https://dllm-reasoning.github.io/.
HoneyBee: Data Recipes for Vision-Language Reasoners
arXiv (Cornell University) · 2025-10-14
preprintOpen accessRecent advances in vision-language models (VLMs) have made them highly effective at reasoning tasks. However, the principles underlying the construction of performant VL reasoning training datasets remain poorly understood. In this work, we introduce several data curation approaches and study their impacts on VL reasoning capabilities by carefully controlling training and evaluation setups. We analyze the effects of context (image and question pair) sources, implement targeted data interventions, and explore scaling up images, questions, and chain-of-thought (CoT) solutions. Our findings reveal that (a) context source strategies significantly affect VLM performance, (b) interventions such as auxiliary signals from image captions and the inclusion of text-only reasoning yield substantial gains, and (c) scaling all data dimensions (e.g., unique questions per image and unique CoTs per image-question pair) consistently improves reasoning capability. Motivated by these insights, we introduce HoneyBee, a large-scale, high-quality CoT reasoning dataset with 2.5M examples consisting 350K image-question pairs. VLMs trained with HoneyBee outperform state-of-the-art models across model sizes. For instance, a HoneyBee-trained VLM with 3B parameters outperforms the SOTA model and the base model by 7.8% and 24.8%, respectively, on MathVerse. Furthermore, we propose a test-time scaling strategy that reduces decoding cost by 73% without sacrificing accuracy. Overall, this work presents improved strategies for VL reasoning dataset curation research. Data is available at https://huggingface.co/datasets/facebook/HoneyBee.
ArXiv.org · 2025-06-18
preprintOpen accessSenior authorLarge Language Models (LLMs) are widely used in real-time voice chat applications, typically in combination with text-to-speech (TTS) systems to generate audio responses. However, their large size often leads to noticeable latency between the end of user input and the start of audio output, resulting in suboptimal user experiences. This latency is particularly evident when LLMs are deployed as single-user voice assistants on consumer-grade hardware with limited computing capacity. We discovered that this latency is primarily dominated by the time it takes for the LLMs to generate the first sentence, which is required as input by the TTS systems that synthesize audio responses on a sentence-by-sentence basis. To address this bottleneck, we propose Predictive Generation (PredGen), a novel framework that mitigates-or even eliminates-this delay through speculative decoding at input time. PredGen generates candidate responses while the user is still speaking, enabling the system to begin TTS processing with minimal delay. Simulated experiments on the Lmsys and MT-Bench datasets show that the proposed method can effectively reduce the latency by around 2x across a wide range of use cases, while incurring only minimal additional computation cost at input time-computation that would otherwise go unused.
The Pitfalls of KV Cache Compression
ArXiv.org · 2025-09-30
preprintOpen accessKV cache compression promises increased throughput and efficiency with negligible loss in performance. While the gains in throughput are indisputable and recent literature has indeed shown minimal degradation on particular benchmarks, in general the consequences of compression in realistic scenarios such as multi-instruction prompting have been insufficiently studied. In this paper, we identify several pitfalls that practitioners should be aware of when deploying KV cache compressed LLMs. We evaluate five KV cache compression methods (StreamingLLM, SnapKV, TOVA, H2O, and K-Norm) on Llama3.1 8B and Qwen2.5 14B under multi-instruction prompting with IFEval. Importantly, we show that certain instructions degrade much more rapidly with compression, effectively causing them to be completely ignored by the LLM. As a practical example, we highlight system prompt leakage as a case study, empirically demonstrating the impact of compression on leakage and general instruction-following. We identify several factors that contribute to system prompt leakage: compression method, instruction order, and KV eviction bias. We then propose simple changes to KV cache eviction policies that can reduce the impact of these factors and improve the overall performance in multi-instruction tasks.
Frequent coauthors
- 41 shared
Stefano Ermon
- 13 shared
Hritik Bansal
- 12 shared
Pieter Abbeel
University of California, Berkeley
- 9 shared
Tung Thanh Nguyen
- 9 shared
Stefano Ermon
Stanford University
- 7 shared
Rui Shu
Northeastern University
- 7 shared
Yaron Lipman
- 7 shared
Kai-Wei Chang
Awards & honors
- AI2050 Early Career Fellowship (2024)
- Forbes 30 Under 30 (2023)
- Samsung AI Researcher of the Year Award (2022)
- NeurIPS Outstanding Paper Award (2021)
- ACM SIGKDD Doctoral Dissertation Award (2021)
- Resume-aware match score
- Save to shortlist
- AI-drafted outreach
See your match with Aditya Grover
PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.
- Free to start
- No credit card
- 30-second signup