
About
Risi Kondor is an Associate Professor of Computer Science at the University of Chicago, with a focus on basic machine learning methodology inspired by ideas from algebra and computational harmonic analysis. His research has recently concentrated on the intersection of machine learning and science, including the development of novel graph neural network architectures for chemistry, approaches to learning molecular force fields, and neural network approximations to quantum states. His group has made foundational contributions to the theory of group equivariant neural networks, which are utilized in physics, chemistry, computer vision, and medical imaging. Additionally, his team develops high-performance, open-source software libraries to support their research. Risi Kondor holds a BA in mathematics from Cambridge University and a PhD in computer science from Columbia University. He has completed postdoctoral appointments at the Gatsby Unit (UCL) and the Center for the Mathematics of Information at Caltech. His academic background also includes a diploma in computational fluid dynamics from the Von Karman Institute and an MS in machine learning from Carnegie Mellon University. His professional experience includes work in the deep learning team at Amazon Web Services and at the Center for Computational Mathematics at the Flatiron Institute. His research focuses on computational harmonic analysis, machine learning, and the mathematical foundations of computation, including algorithm design, complexity, and logic. He is a member of both the Department of Computer Science and the Department of Statistics at the University of Chicago.
Research topics
- Computer Science
- Artificial Intelligence
- Theoretical physics
- Pure mathematics
- Quantum mechanics
- Mathematics
- Physics
- Classical mechanics
- Particle physics
- Geometry
- Mathematical physics
- Combinatorics
Selected publications
ArXiv.org · 2026-05-12
articleOpen accessSenior authorNeural representations are not unique objects. Even when two systems realize the same downstream computation, their hidden coordinates may differ by reparameterization. A probe family intended to reveal structure already present in a representation should therefore be stable under the relevant representation symmetries rather than be tied to a particular basis. We study this group action in the tractable exact setting of the final readout layer, where equivalent realizations induce affine changes of hidden coordinates. The resulting symmetry principle singles out a unique hierarchy of shallow coordinate-stable probes, with linear probes as its degree-1 member. We also show that a natural object for cross-model probe transfer is a shared probe-visible quotient--the representation modulo directions invisible to the probe family--rather than the full hidden state. Experiments on synthetic and real-world tasks support both predictions, showing where degree-2 probes help beyond linear ones and how quotient-based transfer enables coverage-aware monitor portability across model families. These results point toward a broader geometric representation theory of neural probing, with coverage-aware monitor transfer as a concrete operational consequence.
arXiv (Cornell University) · 2026-05-12
preprintOpen accessSenior authorNeural representations are not unique objects. Even when two systems realize the same downstream computation, their hidden coordinates may differ by reparameterization. A probe family intended to reveal structure already present in a representation should therefore be stable under the relevant representation symmetries rather than be tied to a particular basis. We study this group action in the tractable exact setting of the final readout layer, where equivalent realizations induce affine changes of hidden coordinates. The resulting symmetry principle singles out a unique hierarchy of shallow coordinate-stable probes, with linear probes as its degree-1 member. We also show that a natural object for cross-model probe transfer is a shared probe-visible quotient--the representation modulo directions invisible to the probe family--rather than the full hidden state. Experiments on synthetic and real-world tasks support both predictions, showing where degree-2 probes help beyond linear ones and how quotient-based transfer enables coverage-aware monitor portability across model families. These results point toward a broader geometric representation theory of neural probing, with coverage-aware monitor transfer as a concrete operational consequence.
Machine Learning Science and Technology · 2026-04-01
articleOpen accessSenior authorAbstract Multiresolution matrix factorization (MMF) is unusual amongst fast matrix factorization algorithms in that it does not make a low rank assumption. This makes MMF especially well suited to modeling certain types of graphs with complex multiscale or hierarchical structure. While MMF promises to yields a useful wavelet basis, finding the factorization itself is hard, and existing greedy methods tend to be brittle. In this paper, we propose a ‘learnable’ version of MMF that carefully optimizes the factorization using metaheuristics, specifically evolutionary algorithms and directed evolution, along with Stiefel manifold optimization through backpropagating errors. We show that the resulting wavelet basis far outperforms prior MMF algorithms and gives comparable performance on standard learning tasks on graphs. Furthermore, we construct the wavelet neural networks learning graphs on the spectral domain with the wavelet basis produced by our MMF learning algorithm. Our wavelet networks are competitive against other state-of-the-art methods in molecular graphs classification and node classification on citation graphs. We release our implementation at https://github.com/HySonLab/LearnMMF .
Graph Contrastive Learning versus Untrained Baselines: The Role of Dataset Size
ArXiv.org · 2025-09-01
preprintOpen accessGraph Contrastive Learning (GCL) has emerged as a leading paradigm for self-supervised learning on graphs, with strong performance reported on standardized datasets and growing applications ranging from genomics to drug discovery. We ask a basic question: does GCL actually outperform untrained baselines? We find that GCL's advantage depends strongly on dataset size and task difficulty. On standard datasets, untrained Graph Neural Networks (GNNs), simple multilayer perceptrons, and even handcrafted statistics can rival or exceed GCL. On the large molecular dataset ogbg-molhiv, we observe a crossover: GCL lags at small scales but pulls ahead beyond a few thousand graphs, though this gain eventually plateaus. On synthetic datasets, GCL accuracy approximately scales with the logarithm of the number of graphs and its performance gap (compared with untrained GNNs) varies with respect to task complexity. Moving forward, it is crucial to identify the role of dataset size in benchmarks and applications, as well as to design GCL algorithms that avoid performance plateaus.
Probabilistic Modeling of Latent Agentic Substructures in Deep Neural Networks
arXiv (Cornell University) · 2025-09-08
preprintOpen accessWe develop a theory of intelligent agency grounded in probabilistic modeling for neural models. Agents are represented as outcome distributions with epistemic utility given by log score, and compositions are defined through weighted logarithmic pooling that strictly improves every member's welfare. We prove that strict unanimity is impossible under linear pooling or in binary outcome spaces, but possible with three or more outcomes. Our framework admits recursive structure via cloning invariance, continuity, and openness, while tilt-based analysis rules out trivial duplication. Finally, we formalize an agentic alignment phenomenon in LLMs using our theory: eliciting a benevolent persona ("Luigi'") induces an antagonistic counterpart ("Waluigi"), while a manifest-then-suppress Waluigi strategy yields strictly larger first-order misalignment reduction than pure Luigi reinforcement alone. These results clarify how developing a principled mathematical framework for how subagents can coalesce into coherent higher-level entities provides novel implications for alignment in agentic AI systems.
The principles behind equivariant neural networks for physics and chemistry
Proceedings of the National Academy of Sciences · 2025-10-06 · 4 citations
articleOpen access1st authorCorrespondingA distinguishing feature of the neural network models used in Physics and Chemistry is that they must obey basic underlying symmetries, such as symmetry to translations, rotations, and the exchange of identical particles. Over the course of the last several years, the artificial neural networks community has developed a class of networks called group-equivariant neural nets that can efficiently "bake-in" such symmetries into the structure of the network itself. Equivariant neural nets leverage ideas from group representation theory and express all variables in the generalized Fourier space corresponding to the underlying group. In this article, we review this formalism and derive the general form of operations allowable in equivariant neural networks. Specifically, we discuss why the Clebsch-Gordan transform appears in such architectures, and how it can play the role of an equivariant nonlinearity.
Physical review. A/Physical review, A · 2025-11-19 · 8 citations
preprintOpen accessWe introduce a framework of the equivariant convolutional quantum algorithms which is tailored for a number of machine-learning tasks on physical systems with arbitrary $\mathrm{SU}(d)$ symmetries. It allows us to enhance a natural model of quantum computation---permutational quantum computing (PQC) [Jordan, Quantum Inf. Comput. 10, 470 (2010)]---and define a more powerful model: $\mathrm{PQC}+$. While PQC was shown to be efficiently classically simulatable, we exhibit a problem which can be efficiently solved on a $\mathrm{PQC}+$ machine, whereas no classical polynomial time algorithm is known, thus providing evidence against $\mathrm{PQC}+$ being classically simulatable. We further discuss practical quantum machine learning algorithms which can be carried out in the paradigm of $\mathrm{PQC}+$.
A Geometric Approach to Steerable Convolutions
ArXiv.org · 2025-10-21
preprintOpen accessSenior authorIn contrast to the somewhat abstract, group theoretical approach adopted by many papers, our work provides a new and more intuitive derivation of steerable convolutional neural networks in $d$ dimensions. This derivation is based on geometric arguments and fundamental principles of pattern matching. We offer an intuitive explanation for the appearance of the Clebsch--Gordan decomposition and spherical harmonic basis functions. Furthermore, we suggest a novel way to construct steerable convolution layers using interpolation kernels that improve upon existing implementation, and offer greater robustness to noisy data.
Sign Rank Limitations for Inner Product Graph Decoders
arXiv (Cornell University) · 2024-02-06
preprintOpen accessSenior authorInner product-based decoders are among the most influential frameworks used to extract meaningful data from latent embeddings. However, such decoders have shown limitations in representation capacity in numerous works within the literature, which have been particularly notable in graph reconstruction problems. In this paper, we provide the first theoretical elucidation of this pervasive phenomenon in graph data, and suggest straightforward modifications to circumvent this issue without deviating from the inner product framework.
Steerable Transformers for Volumetric Data
arXiv (Cornell University) · 2024-05-24
preprintOpen accessSenior authorWe introduce Steerable Transformers, an extension of the Vision Transformer mechanism that maintains equivariance to the special Euclidean group $\mathrm{SE}(d)$. We propose an equivariant attention mechanism that operates on features extracted by steerable convolutions. Operating in Fourier space, our network utilizes Fourier space non-linearities. Our experiments in both two and three dimensions show that adding steerable transformer layers to steerable convolutional networks enhances performance.
Frequent coauthors
- 32 shared
Brandon Anderson
- 20 shared
Martin Vögele
Schrodinger (United States)
- 20 shared
Ron O. Dror
Stanford University
- 20 shared
Stephan Eismann
Stanford University
- 20 shared
Patricia Suriana
- 20 shared
Alexander Derry
Stanford University
- 20 shared
Russ B. Altman
Stanford University
- 20 shared
Yianni Laloudakis
Stanford University
Labs
1-2 sentence research focus
Education
- 1990
Ph.D., Computer Science
University of California, Los Angeles
- 1986
M.S., Computer Science
University of California, Los Angeles
- 1984
B.S., Computer Science
University of California, Los Angeles
Awards & honors
- 2016 DARPA Young Faculty Award
- 2012 SIGMOD Test of Time Award
- Resume-aware match score
- Save to shortlist
- AI-drafted outreach
See your match with Risi Kondor
PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.
- Free to start
- No credit card
- 30-second signup