Aditya Grover

· ProfessorVerified

University of California, Los Angeles · Computer Science

Active 2002–2026

h-index29

Citations13.6k

Papers12171 last 5y

Funding—

Faculty page

See your match with Aditya Grover — sign in to PhdFit.Sign in

About

Aditya Grover is an Assistant Professor of Computer Science at UCLA Samueli School of Engineering. His research focuses on probabilistic machine learning for unsupervised representation learning and sequential decision making. He has contributed to advancements in neural information processing, including divergence-based generative modeling on manifolds. Grover has received numerous awards for his work, including the Forbes 30 Under 30 in Science (2023), the AI2050 Early Career Fellowship (2024), and the Samsung AI Researcher of the Year Award (2022). His work has been recognized in the media and he has been acknowledged for his outstanding contributions to the field of machine learning and AI.

Research signals

Five dimensions sourced from public faculty / publication signals. Sign in to compare against your own profile and see your match score.

Research topics

Computer Science
Machine Learning
Artificial Intelligence
Engineering
Meteorology
Ecology
Electrical engineering
Geography
Mathematics
Reliability engineering

Selected publications

Reassessing the Scaling of AI-Powered Climate Models Against Dynamical Counterparts
2026-03-14
articleOpen access
Are AI-powered climate models intrinsically more efficient than traditional climate models?While progress is still needed before they become operational, hybrid AI-physics climate models and AI emulators of climate models have the potential to sharply reduce inference cost relative to traditional CPU-based models, allowing larger ensembles to explore different scenarios and sharpen uncertainty estimation. Yet this apparent efficiency becomes less obvious when the comparison includes GPU-ported dynamical climate models, and when efficiency is assessed against the effective complexity of the simulated climate system.As a first step, recognizing that a perfect apple-to-apple comparison is rarely possible from reported configurations, we synthesize reported performance for leading AI climate model emulators (e.g., ACE2, CAMulator), hybrid AI-physics models (e.g., CliMA, NeuralGCM), and GPU-accelerated traditional models (e.g., SCREAM, ICON). We examine two complementary scaling views. The first compares throughput (simulated years per day) per accelerator (GPUs or TPUs) and per prognostic variable, as a function of horizontal grid spacing. The second compares the same normalized throughput against an effective complexity proxy, defined as the number of vertical levels divided by the product of the time step and the squared horizontal grid spacing, to account for the simulated vertical structure and, importantly, time-step constraints imposed by numerical stability.We find that AI-powered models can show favorable apparent scaling with horizontal resolution in raw throughput, but that the advantage becomes modest once effective complexity is accounted for: at comparable complexity, AI climate models do not appear intrinsically more efficient than GPU-ported dynamical models. Hybrid approaches occupy a distinct middle ground: their acceleration and added value come primarily from learned parameterizations that improve the representation of unresolved processes while the overall model retains a physically-based dynamical core, including explicit conservation laws. AI climate model emulators, by contrast, offer their clearest computational advantage through task-targeted prediction, where a limited set of climate-relevant variables can be directly simulated on the grid of interest. This avoids integrating the full high-frequency, multivariate state at the short time step traditionally required for numerical stability, which is especially advantageous when emulating a fine-resolution reference model with a coarser emulator. Diverse downscaling or targeted post-processing strategies can further substitute for explicit fine-scale resolution when observations are available, enabling inexpensive local or hazard-specific risk assessment at decadal to multi-decadal time horizons.
Publisher DOI
Iceberg: Enhancing HLS Modeling with Synthetic Data
2025-06-26 · 1 citations
article
Deep learning-based prediction models for High-Level Synthesis (HLS) of hardware designs often struggle to generalize. In this paper, we study how to close the generalizability gap of these models through pretraining on synthetic data and introduce Iceberg, a synthetic data augmentation approach that expands both large language model (LLM)-generated programs and weak labels of unseen design configurations. Our weak label generation method is integrated with an in-context model architecture, enabling meta-learning from actual and proximate labels. Iceberg improves the geometric mean modeling accuracy by 86.4% when adapt to six real-world applications with few-shot examples and achieves a 2.47× and a 1.12× better offline DSE performance when adapting to two different test datasets. Our open-sourced code is here: https://github.com/UCLA-VAST/iceberg.
Publisher DOI
IndiaWeatherBench: A Dataset and Benchmark for Data-Driven Regional Weather Forecasting over India
ArXiv.org · 2025-08-31
preprintOpen accessSenior author
Regional weather forecasting is a critical problem for localized climate adaptation, disaster mitigation, and sustainable development. While machine learning has shown impressive progress in global weather forecasting, regional forecasting remains comparatively underexplored. Existing efforts often use different datasets and experimental setups, limiting fair comparison and reproducibility. We introduce IndiaWeatherBench, a comprehensive benchmark for data-driven regional weather forecasting focused on the Indian subcontinent. IndiaWeatherBench provides a curated dataset built from high-resolution regional reanalysis products, along with a suite of deterministic and probabilistic metrics to facilitate consistent training and evaluation. To establish strong baselines, we implement and evaluate a range of models across diverse architectures, including UNets, Transformers, and Graph-based networks, as well as different boundary conditioning strategies and training objectives. While focused on India, IndiaWeatherBench is easily extensible to other geographic regions. We open-source all raw and preprocessed datasets, model implementations, and evaluation pipelines to promote accessibility and future development. We hope IndiaWeatherBench will serve as a foundation for advancing regional weather forecasting research. Code is available at https://github.com/tung-nd/IndiaWeatherBench.
Publisher OA PDF DOI
Comparing Bad Apples to Good Oranges Aligning Large Language Models via Joint Preference Optimization
2025-01-01 · 1 citations
articleOpen accessSenior author
A common technique for aligning large language models (LLMs) relies on acquiring human preferences by comparing multiple generations conditioned on a fixed context.This method, however, relies solely on pairwise comparisons, where the generations are evaluated within an identical context.While effective to such conditional preferences often fail to encompass the nuanced and multidimensional nature of human preferences.In this work, we revisit the traditional paradigm of preference acquisition and propose a new axis based on eliciting preferences jointly over the instructionresponse pairs.Unlike prior preference optimizations, which are designed for conditional ranking protocols (e.g., DPO), we propose Joint Preference Optimization (JPO), a new preference optimization objective that upweights the joint probability of the chosen instructionresponse pair over the rejected instructionresponse pair.Interestingly, LLMs trained with joint instruction-response preference data using JPO outperform LLM trained with DPO by 5.2% and 3.3% win-rate for summarization and open-ended dialogue datasets, respectively.Our findings reveal that joint preferences over instruction and response pairs can significantly enhance the alignment of LLMs by tapping into a broader spectrum of human preference elicitation.The data and code is available at https://github.com/Hritikbansal/jpo.Create a list of four fruits other than Apple Create a list of four fruits Apple, Orange, Banana, Grape Orange, Blueberry, Kiwi, Banana Apple, Orange, Banana, Grape Create a list of beach activities wear sunscreen, don't litter
Publisher OA PDF DOI
MobileWorldBench: Towards Semantic World Modeling For Mobile Agents
ArXiv.org · 2025-12-16
preprintOpen accessSenior author
World models have shown great utility in improving the task performance of embodied agents. While prior work largely focuses on pixel-space world models, these approaches face practical limitations in GUI settings, where predicting complex visual elements in future states is often difficult. In this work, we explore an alternative formulation of world modeling for GUI agents, where state transitions are described in natural language rather than predicting raw pixels. First, we introduce MobileWorldBench, a benchmark that evaluates the ability of vision-language models (VLMs) to function as world models for mobile GUI agents. Second, we release MobileWorld, a large-scale dataset consisting of 1.4M samples, that significantly improves the world modeling capabilities of VLMs. Finally, we propose a novel framework that integrates VLM world models into the planning framework of mobile agents, demonstrating that semantic world models can directly benefit mobile agents by improving task success rates. The code and dataset is available at https://github.com/jacklishufan/MobileWorld
Publisher OA PDF DOI
Enabling Autoregressive Models to Fill In Masked Tokens
ArXiv.org · 2025-02-09
preprintOpen access
Historically, LLMs have been trained using either autoregressive (AR) or masked language modeling (MLM) objectives, with AR models gaining dominance in recent years. However, AR models are inherently incapable of masked infilling, which is the ability to predict masked tokens between past and future context. In contrast, MLM models suffer from intrinsic computational inefficiencies during both training and inference that hinder their scalability. This work introduces MARIA (Masked and Autoregressive Infilling Architecture), a novel approach that leverages the strengths of both paradigms to achieve state-of-the-art masked infilling performance. MARIA combines a pre-trained MLM and AR model by training a linear decoder that takes their concatenated hidden states as input. This minimal modification enables the AR model to perform infilling while retaining its inherent advantages in terms of faster inference with KV caching. Our results demonstrate that MARIA significantly outperforms existing methods, namely discrete diffusion models, on masked infilling tasks.
Publisher OA PDF DOI
d1: Scaling Reasoning in Diffusion Large Language Models via Reinforcement Learning
ArXiv.org · 2025-04-16
preprintOpen accessSenior author
Recent large language models (LLMs) have demonstrated strong reasoning capabilities that benefits from online reinforcement learning (RL). These capabilities have primarily been demonstrated within the left-to-right autoregressive (AR) generation paradigm. In contrast, non-autoregressive paradigms based on diffusion generate text in a coarse-to-fine manner. Although recent diffusion-based large language models (dLLMs) have achieved competitive language modeling performance compared to their AR counterparts, it remains unclear if dLLMs can also leverage recent advances in LLM reasoning. To this end, we propose d1, a framework to adapt pre-trained masked dLLMs into reasoning models via a combination of supervised finetuning (SFT) and RL. Specifically, we develop and extend techniques to improve reasoning in pretrained dLLMs: (a) we utilize a masked SFT technique to distill knowledge and instill self-improvement behavior directly from existing datasets, and (b) we introduce a novel critic-free, policy-gradient based RL algorithm called diffu-GRPO, the first integration of policy gradient methods to masked dLLMs. Through empirical studies, we investigate the performance of different post-training recipes on multiple mathematical and planning benchmarks. We find that d1 yields the best performance and significantly improves performance of a state-of-the-art dLLM. Our code is released at https://dllm-reasoning.github.io/.
Publisher OA PDF DOI
HoneyBee: Data Recipes for Vision-Language Reasoners
arXiv (Cornell University) · 2025-10-14
preprintOpen access
Recent advances in vision-language models (VLMs) have made them highly effective at reasoning tasks. However, the principles underlying the construction of performant VL reasoning training datasets remain poorly understood. In this work, we introduce several data curation approaches and study their impacts on VL reasoning capabilities by carefully controlling training and evaluation setups. We analyze the effects of context (image and question pair) sources, implement targeted data interventions, and explore scaling up images, questions, and chain-of-thought (CoT) solutions. Our findings reveal that (a) context source strategies significantly affect VLM performance, (b) interventions such as auxiliary signals from image captions and the inclusion of text-only reasoning yield substantial gains, and (c) scaling all data dimensions (e.g., unique questions per image and unique CoTs per image-question pair) consistently improves reasoning capability. Motivated by these insights, we introduce HoneyBee, a large-scale, high-quality CoT reasoning dataset with 2.5M examples consisting 350K image-question pairs. VLMs trained with HoneyBee outperform state-of-the-art models across model sizes. For instance, a HoneyBee-trained VLM with 3B parameters outperforms the SOTA model and the base model by 7.8% and 24.8%, respectively, on MathVerse. Furthermore, we propose a test-time scaling strategy that reduces decoding cost by 73% without sacrificing accuracy. Overall, this work presents improved strategies for VL reasoning dataset curation research. Data is available at https://huggingface.co/datasets/facebook/HoneyBee.
Publisher OA PDF DOI
PredGen: Accelerated Inference of Large Language Models through Input-Time Speculation for Real-Time Speech Interaction
ArXiv.org · 2025-06-18
preprintOpen accessSenior author
Large Language Models (LLMs) are widely used in real-time voice chat applications, typically in combination with text-to-speech (TTS) systems to generate audio responses. However, their large size often leads to noticeable latency between the end of user input and the start of audio output, resulting in suboptimal user experiences. This latency is particularly evident when LLMs are deployed as single-user voice assistants on consumer-grade hardware with limited computing capacity. We discovered that this latency is primarily dominated by the time it takes for the LLMs to generate the first sentence, which is required as input by the TTS systems that synthesize audio responses on a sentence-by-sentence basis. To address this bottleneck, we propose Predictive Generation (PredGen), a novel framework that mitigates-or even eliminates-this delay through speculative decoding at input time. PredGen generates candidate responses while the user is still speaking, enabling the system to begin TTS processing with minimal delay. Simulated experiments on the Lmsys and MT-Bench datasets show that the proposed method can effectively reduce the latency by around 2x across a wide range of use cases, while incurring only minimal additional computation cost at input time-computation that would otherwise go unused.
Publisher OA PDF DOI
The Pitfalls of KV Cache Compression
ArXiv.org · 2025-09-30
preprintOpen access
KV cache compression promises increased throughput and efficiency with negligible loss in performance. While the gains in throughput are indisputable and recent literature has indeed shown minimal degradation on particular benchmarks, in general the consequences of compression in realistic scenarios such as multi-instruction prompting have been insufficiently studied. In this paper, we identify several pitfalls that practitioners should be aware of when deploying KV cache compressed LLMs. We evaluate five KV cache compression methods (StreamingLLM, SnapKV, TOVA, H2O, and K-Norm) on Llama3.1 8B and Qwen2.5 14B under multi-instruction prompting with IFEval. Importantly, we show that certain instructions degrade much more rapidly with compression, effectively causing them to be completely ignored by the LLM. As a practical example, we highlight system prompt leakage as a case study, empirically demonstrating the impact of compression on leakage and general instruction-following. We identify several factors that contribute to system prompt leakage: compression method, instruction order, and KV eviction bias. We then propose simple changes to KV cache eviction policies that can reduce the impact of these factors and improve the overall performance in multi-instruction tasks.
Publisher OA PDF DOI

Frequent coauthors

Stefano Ermon
41 shared
Hritik Bansal
13 shared
Pieter Abbeel
University of California, Berkeley
12 shared
Tung Thanh Nguyen
9 shared
Stefano Ermon
Stanford University
9 shared
Rui Shu
Northeastern University
7 shared
Yaron Lipman
7 shared
Kai-Wei Chang
7 shared

Awards & honors

AI2050 Early Career Fellowship (2024)
Forbes 30 Under 30 (2023)
Samsung AI Researcher of the Year Award (2022)
NeurIPS Outstanding Paper Award (2021)
ACM SIGKDD Doctoral Dissertation Award (2021)

Resume-aware match score
Save to shortlist
AI-drafted outreach

See your match with Aditya Grover

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

Join the waitlist How it works

Free to start
No credit card
30-second signup

Find professors who actually fit you