Maneesh Agrawala

· Professor of Computer ScienceVerified

Stanford University · Symbolic Systems

Active 1985–2026

h-index74

Citations19.1k

Papers31565 last 5y

Funding$1.9M

Faculty page Lab page Website

See your match with Maneesh Agrawala — sign in to PhdFit.Sign in

About

Maneesh Agrawala is the Forest Baskett Professor of Computer Science at Stanford University and the Director of the Brown Institute for Media Innovation. He holds a Ph.D. in Computer Science from Stanford University, obtained in 2002, and a B.S. in Mathematics from Stanford University, earned in 1994. His academic appointments include professorships in the Departments of Computer Science and Electrical Engineering at Stanford, where he is also a faculty affiliate of the Institute for Human-Centered Artificial Intelligence (HAI). Prior to his current position, he was a Professor of Electrical Engineering and Computer Science at the University of California, Berkeley from 2005 to 2015. Agrawala's research focuses on computer graphics, human-computer interaction, and visualization, with an emphasis on investigating how cognitive design principles can be used to improve the effectiveness of audio/visual media. His work aims to discover design principles and implement them in both interactive and automated design tools. Throughout his career, he has received numerous honors and awards, including a MacArthur Foundation Fellowship, an NSF CAREER Award, a SIGGRAPH Significant New Researcher Award, and fellowships from the Sloan Foundation and the ACM. He also serves as an advisor for the Human Computation Journal and is involved in various professional organizations and advisory roles.

Research topics

Computer Science
Artificial Intelligence
Multimedia
World Wide Web
Algorithm
Physics
Business
Programming language
Advertising
Political Science
Sociology
Theoretical computer science
Computer vision
Mathematics
Geography
Engineering
Discrete mathematics
Database
Linguistics
Psychology

Selected publications

Mode Seeking meets Mean Seeking for Fast Long Video Generation
ArXiv.org · 2026-02-27
articleOpen access
Scaling video generation from seconds to minutes faces a critical bottleneck: while short-video data is abundant and high-fidelity, coherent long-form data is scarce and limited to narrow domains. To address this, we propose a training paradigm where Mode Seeking meets Mean Seeking, decoupling local fidelity from long-term coherence based on a unified representation via a Decoupled Diffusion Transformer. Our approach utilizes a global Flow Matching head trained via supervised learning on long videos to capture narrative structure, while simultaneously employing a local Distribution Matching head that aligns sliding windows to a frozen short-video teacher via a mode-seeking reverse-KL divergence. This strategy enables the synthesis of minute-scale videos that learns long-range coherence and motions from limited long videos via supervised flow matching, while inheriting local realism by aligning every sliding-window segment of the student to a frozen short-video teacher, resulting in a few-step fast long video generator. Evaluations show that our method effectively closes the fidelity-horizon gap by jointly improving local sharpness, motion and long-range consistency. Project website: https://primecai.github.io/mmm/.
Publisher OA PDF
View-oriented Conversation Compiler for Agent Trace Analysis
ArXiv.org · 2026-03-31
articleOpen accessSenior author
Agent traces carry increasing analytical value in agentic systems and context engineering, yet most prior work treats conversation format as a trivial implementation detail. Modern agent conversations, however, contain deeply structured content, including nested tool calls and results, chain-of-thought reasoning blocks, sub-agent invocations, context-window compaction boundaries, and harness-injected system directives, whose complexity far exceeds that of simple user-assistant exchanges. Feeding such traces to a reflector or other analytical mechanism in plain text, JSON, YAML, or via grep can materially degrade analysis quality. This paper presents VCC (View-oriented Conversation Compiler), a compiler (lex, parse, IR, lower, emit) that transforms raw agent JSONL logs into a family of structured views: a full view (lossless transcript serving as the canonical line-number coordinate system), a user-interface (UI) view (reconstructing the interaction as the user actually perceived it), and an adaptive view (a structure-preserving projection governed by a relevance predicate). In a context-engineering experiment on AppWorld, replacing only the reflector's input format, from raw JSONL to VCC-compiled views, leads to higher pass rates across all three model configurations tested, while cutting reflector token consumption by half to two-thirds and producing more concise learned memory. These results suggest that message format functions as infrastructure for context engineering, not as an incidental implementation choice.
Publisher OA PDF
Teaching Spell Checkers to Teach: Pedagogical Program Synthesis for Interactive Learning
2026-03-03 · 1 citations
articleOpen access
Spelling taught through memorization often fails many learners, particularly children with language-based learning disorders who struggle with the phonological skills necessary to spell words accurately. Educators such as speech-language pathologists (SLPs) address this instructional gap by using an inquiry-based approach to teach spelling that targets the phonology, morphology, meaning, and etymology of words. Yet, these strategies rarely appear in everyday writing tools, which simply detect and autocorrect errors. We introduce SPIRE (Spelling Inquiry Engine), a spell check system that brings this inquiry-based pedagogy into the act of composition. SPIRE implements Pedagogical Program Synthesis, a novel approach for operationalizing the inherently dynamic pedagogy of spelling instruction. SPIRE represents SLP instructional moves in a domain-specific language, synthesizes tailored programs in real-time from learner errors, and renders them as interactive interfaces for inquiry-based interventions. With SPIRE, spelling errors become opportunities to explore word meanings, word structures, morphological families, word origins, and grapheme-phoneme correspondences, supporting metalinguistic reasoning alongside correction. Evaluation with SLPs and learners shows alignment with professional practice and potential for integration into writing workflows.
Publisher DOI
Self-Consistency for LLM-Based Motion Trajectory Generation and Verification
arXiv (Cornell University) · 2026-03-31
preprintOpen accessSenior author
Self-consistency has proven to be an effective technique for improving LLM performance on natural language reasoning tasks in a lightweight, unsupervised manner. In this work, we study how to adapt self-consistency to visual domains. Specifically, we consider the generation and verification of LLM-produced motion graphics trajectories. Given a prompt (e.g., "Move the circle in a spiral path"), we first sample diverse motion trajectories from an LLM, and then identify groups of consistent trajectories via clustering. Our key insight is to model the family of shapes associated with a prompt as a prototype trajectory paired with a group of geometric transformations (e.g., rigid, similarity, and affine). Two trajectories can then be considered consistent if one can be transformed into the other under the warps allowable by the transformation group. We propose an algorithm that automatically recovers a shape family, using hierarchical relationships between a set of candidate transformation groups. Our approach improves the accuracy of LLM-based trajectory generation by 4-6%. We further extend our method to support verification, observing 11% precision gains over VLM baselines. Our code and dataset are available at https://majiaju.io/trajectory-self-consistency .
Publisher DOI
Self-Consistency for LLM-Based Motion Trajectory Generation and Verification
arXiv (Cornell University) · 2026-03-31
articleOpen accessSenior author
Self-consistency has proven to be an effective technique for improving LLM performance on natural language reasoning tasks in a lightweight, unsupervised manner. In this work, we study how to adapt self-consistency to visual domains. Specifically, we consider the generation and verification of LLM-produced motion graphics trajectories. Given a prompt (e.g., "Move the circle in a spiral path"), we first sample diverse motion trajectories from an LLM, and then identify groups of consistent trajectories via clustering. Our key insight is to model the family of shapes associated with a prompt as a prototype trajectory paired with a group of geometric transformations (e.g., rigid, similarity, and affine). Two trajectories can then be considered consistent if one can be transformed into the other under the warps allowable by the transformation group. We propose an algorithm that automatically recovers a shape family, using hierarchical relationships between a set of candidate transformation groups. Our approach improves the accuracy of LLM-based trajectory generation by 4-6%. We further extend our method to support verification, observing 11% precision gains over VLM baselines. Our code and dataset are available at https://majiaju.io/trajectory-self-consistency .
Publisher OA PDF
SimStep: Human-in-the-Loop Authoring of Interactive Educational Simulations Through Task-Level Abstractions
2026-04-13 · 1 citations
articleOpen access
Generative AI enables educators to create interactive learning content by describing goals in natural language. However, without programming affordances such as traceability, refinement, and debugging, teachers struggle to align simulations with learners’ needs, refine them step by step, or verify that they reflect intended learning concepts. We propose a task-level abstraction approach that structures authoring as a sequence of representations, mirroring how teachers plan lessons and providing checkpoints for specification, inspection, and refinement. We instantiate this approach in SimStep, an authoring environment that scaffolds simulation design with four abstractions, including Concept Graph, Scenario Graph, Learning Goal Graph, and UI Graph, and introduces an inverse correction process to revise hidden model assumptions without requiring code manipulation. A technical evaluation shows that these abstractions preserve fidelity across transformations, while a user study with educators demonstrates their effectiveness in authoring simulations. Our work reframes AI-assisted programming as human–AI co-authoring through structured, domain-aligned abstractions.
Publisher DOI
Mode Seeking meets Mean Seeking for Fast Long Video Generation
Open MIND · 2026-02-27
preprint
Scaling video generation from seconds to minutes faces a critical bottleneck: while short-video data is abundant and high-fidelity, coherent long-form data is scarce and limited to narrow domains. To address this, we propose a training paradigm where Mode Seeking meets Mean Seeking, decoupling local fidelity from long-term coherence based on a unified representation via a Decoupled Diffusion Transformer. Our approach utilizes a global Flow Matching head trained via supervised learning on long videos to capture narrative structure, while simultaneously employing a local Distribution Matching head that aligns sliding windows to a frozen short-video teacher via a mode-seeking reverse-KL divergence. This strategy enables the synthesis of minute-scale videos that learns long-range coherence and motions from limited long videos via supervised flow matching, while inheriting local realism by aligning every sliding-window segment of the student to a frozen short-video teacher, resulting in a few-step fast long video generator. Evaluations show that our method effectively closes the fidelity-horizon gap by jointly improving local sharpness, motion and long-range consistency. Project website: https://primecai.github.io/mmm/.
DOI
View-oriented Conversation Compiler for Agent Trace Analysis
arXiv (Cornell University) · 2026-03-31
preprintOpen accessSenior author
Agent traces carry increasing analytical value in agentic systems and context engineering, yet most prior work treats conversation format as a trivial implementation detail. Modern agent conversations, however, contain deeply structured content, including nested tool calls and results, chain-of-thought reasoning blocks, sub-agent invocations, context-window compaction boundaries, and harness-injected system directives, whose complexity far exceeds that of simple user-assistant exchanges. Feeding such traces to a reflector or other analytical mechanism in plain text, JSON, YAML, or via grep can materially degrade analysis quality. This paper presents VCC (View-oriented Conversation Compiler), a compiler (lex, parse, IR, lower, emit) that transforms raw agent JSONL logs into a family of structured views: a full view (lossless transcript serving as the canonical line-number coordinate system), a user-interface (UI) view (reconstructing the interaction as the user actually perceived it), and an adaptive view (a structure-preserving projection governed by a relevance predicate). In a context-engineering experiment on AppWorld, replacing only the reflector's input format, from raw JSONL to VCC-compiled views, leads to higher pass rates across all three model configurations tested, while cutting reflector token consumption by half to two-thirds and producing more concise learned memory. These results suggest that message format functions as infrastructure for context engineering, not as an incidental implementation choice.
Publisher DOI
LouvreSAE: Sparse Autoencoders for Interpretable and Controllable Style Transfer
arXiv (Cornell University) · 2025-12-22
preprintOpen access
Artistic style transfer in generative models remains a significant challenge, as existing methods often introduce style only via model fine-tuning, additional adapters, or prompt engineering, all of which can be computationally expensive and may still entangle style with subject matter. In this paper, we introduce a training- and inference-light, interpretable method for representing and transferring artistic style. Our approach leverages an art-specific Sparse Autoencoder (SAE) on top of latent embeddings of generative image models. Trained on artistic data, our SAE learns an emergent, largely disentangled set of stylistic and compositional concepts, corresponding to style-related elements pertaining brushwork, texture, and color palette, as well as semantic and structural concepts. We call it LouvreSAE and use it to construct style profiles: compact, decomposable steering vectors that enable style transfer without any model updates or optimization. Unlike prior concept-based style transfer methods, our method requires no fine-tuning, no LoRA training, and no additional inference passes, enabling direct steering of artistic styles from only a few reference images. We validate our method on ArtBench10, achieving or surpassing existing methods on style evaluations (VGG Style Loss and CLIP Score Style) while being 1.7-20x faster and, critically, interpretable.
Publisher DOI
Uncovering Conceptual Blindspots in Generative Image Models Using Sparse Autoencoders
ArXiv.org · 2025-06-24
preprintOpen access
Despite their impressive performance, generative image models trained on large-scale datasets frequently fail to produce images with seemingly simple concepts -- e.g., human hands or objects appearing in groups of four -- that are reasonably expected to appear in the training data. These failure modes have largely been documented anecdotally, leaving open the question of whether they reflect idiosyncratic anomalies or more structural limitations of these models. To address this, we introduce a systematic approach for identifying and characterizing "conceptual blindspots" -- concepts present in the training data but absent or misrepresented in a model's generations. Our method leverages sparse autoencoders (SAEs) to extract interpretable concept embeddings, enabling a quantitative comparison of concept prevalence between real and generated images. We train an archetypal SAE (RA-SAE) on DINOv2 features with 32,000 concepts -- the largest such SAE to date -- enabling fine-grained analysis of conceptual disparities. Applied to four popular generative models (Stable Diffusion 1.5/2.1, PixArt, and Kandinsky), our approach reveals specific suppressed blindspots (e.g., bird feeders, DVD discs, and whitespaces on documents) and exaggerated blindspots (e.g., wood background texture and palm trees). At the individual datapoint level, we further isolate memorization artifacts -- instances where models reproduce highly specific visual templates seen during training. Overall, we propose a theoretically grounded framework for systematically identifying conceptual blindspots in generative models by assessing their conceptual fidelity with respect to the underlying data-generating process.
Publisher OA PDF DOI

Recent grants

DC: Medium: Collaborative Research: Data Intensive Computing: Scalable, Social Data Analysis
NSF · $667k · 2010–2014
CAREER: Design Principles, Algorithms, and Interfaces for Visual Communication
NSF · $400k · 2007–2012
III: Small: Extracting Data and Structure from Charts and Graphs for Analysis, Reuse and Indexing
NSF · $499k · 2017–2021
HCC-Small: Collaborative Research: Design and Evaluation of the Next Generation of E-book Readers.
NSF · $76k · 2008–2011
HCC: Small: Collaborative Research: Graphical Perception Revisited: Developing and Validating Design Guidelines for Data Visualization
NSF · $250k · 2010–2014

Frequent coauthors

David Salesin
University of Washington
39 shared
Pat Hanrahan
Stanford University
38 shared
Doantam Phan
30 shared
Barbara Tversky
Stanford University
30 shared
Julie Heiser
30 shared
Jeff Klingner
Perfect Harmony Health
28 shared
Chris Stolte
27 shared
Wilmot Li
Adobe Systems (United States)
27 shared

Labs

Maneesh Agrawala's LabPI
Computer Graphics, Human-Computer Interaction, Visualization, Information Visualization, Scientific Visualization

Education

Ph.D.
Stanford University
M.S.
University of California, Berkeley
B.S.
University of California, Berkeley

Awards & honors

Research Grant, Okawa Foundation (2006)
CAREER Award, National Science Foundation (2007)
Research Fellow, Alfred P. Sloan Foundation (2007)
Significant New Researcher Award, ACM SIGGRAPH (2008)
Fellow, MacArthur Foundation (2009)

Resume-aware match score
Save to shortlist
AI-drafted outreach

See your match with Maneesh Agrawala

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

Join the waitlist How it works

Free to start
No credit card
30-second signup

Find professors who actually fit you