Eli Shlizerman

· Associate ProfessorVerified

University of Washington · Atmospheric Sciences

Active 2005–2026

h-index20

Citations1.8k

Papers13162 last 5y

Funding$1.2M

Faculty page

See your match with Eli Shlizerman — sign in to PhdFit.Sign in

About

Eli Shlizerman is an Associate Professor in the Department of Applied Mathematics at the University of Washington. He completed his B.Sc. in Mathematics and Computer Science (magna cum laude) in 2002, and earned both an M.Sc. and Ph.D. in Applied Mathematics from the Weizmann Institute of Science in 2005 and 2009, respectively. His research group combines dynamical systems theory with data analysis to develop realistic data-driven dynamical models, focusing on inference of network architecture and modeling the dynamics of networks. His work is at the interface of computational approaches and biological and physical system modeling, with particular emphasis on neurobiological networks underlying insect sensory systems and neural dynamics of simple organisms.

Research topics

Artificial Intelligence
Computer Science

Selected publications

DiffuMask: Diffusion Language Model for Token-level Prompt Pruning
arXiv (Cornell University) · 2026-04-08
preprintOpen access
In-Context Learning and Chain-of-Thought prompting improve reasoning in large language models (LLMs). These typically come at the cost of longer, more expensive prompts that may contain redundant information. Prompt compression based on pruning offers a practical solution, yet existing methods rely on sequential token removal which is computationally intensive. We present DiffuMask, a diffusion-based framework integrating hierarchical shot-level and token-level pruning signals, that enables rapid and parallel prompt pruning via iterative mask prediction. DiffuMask substantially accelerates the compression process via masking multiple tokens in each denoising step. It offers tunable control over retained content, preserving essential reasoning context and achieving up to 80\% prompt length reduction. Meanwhile, it maintains or improves accuracy across in-domain, out-of-domain, and cross-model settings. Our results show that DiffuMask provides a generalizable and controllable framework for prompt compression, facilitating faster and more reliable in-context reasoning in LLMs.
Publisher DOI
Advantages of Broadband Metalenses for Generalizable Image Classification
ACS Photonics · 2026-04-28
article
Publisher DOI
Advantages of Broadband Metalenses for GeneralizableImage Classification
Figshare · 2026-04-28
articleOpen access
Optical neural networks (ONNs) are gaining increasing attention to accelerate machine learning tasks. In particular, static meta-optical encoders designed for task-specific preprocessing have demonstrated orders of magnitude smaller energy consumption over purely digital counterparts, albeit at the cost of a slight degradation in classification accuracy. However, a lack of generalizability poses serious challenges for wide deployment of static meta-optical front-ends. Here, we investigate the utility of a single-layer metalens as a meta-optical encoder in ONNs for generalizable image classification. Specifically, we show that a visible-spectrum broadband metalens can achieve image classification accuracy comparable to high-end, sensor-limited optics and consistently outperforms the corresponding hyperboloid baseline across a wide range of sensor pixel sizes and digital backends. We further design an end-to-end optimized single-aperture metasurface for ImageNet classification and observe that the optimization tends to balance the modulation transfer function (MTF) across wavelengths within the sensor-detectable passband. Together, these observations suggest that the preservation of spatial-frequency information is an important factor influencing the performance of ONNs. Our results provide physical insight into the process of task-driven optical optimization and offer practical guidance for the design of high-performance ONNs and meta-optical encoders for generalizable computer-vision tasks.
Publisher DOI
The Global Empirical NTK: Self-Referential Bias and Dimensionality of Gradient Descent Learning
ArXiv.org · 2026-05-09
articleOpen access
In training a neural network with gradient descent (GD), each iteration induces a linear operator that governs first-order updates to a model's internal state variables. We define this operator as the Global Empirical Neural Tangent Kernel (NTK). In finite-width networks, the NTK is typically intractable to form, leading prior work to focus on restrictive settings such as tracking outputs only or taking infinite-width limits. Here, we study the structure of the NTK for a range of models. Formulating the model state as the solution to a single global implicit constraint, we derive the NTK as a product of two operators: K, accounting for immediate parameter-to-state interactions, and P, describing internal state-to-state dependencies. For a broad class of weight-based models, including RNNs and transformers, we prove a universal Kronecker-core theorem showing that K admits an exact, computable form given by the Gram matrix of weight-site variables. This core structure reveals that the NTK is structurally bottlenecked, constraining its effective rank and giving rise to a self-referential bias whereby GD preferentially learns within dominant modes of joint hidden and input activity. For recurrent models, we examine the spectrum of the NTK and show when it is biased and low-rank in space or time under the proposed decomposition. We further demonstrate that model dynamics at initialization bias the NTK, restricting learning and preventing task components from being learned effectively. Finally, we show that the NTK associated with a self-attention transformer is likewise structurally constrained to be low-rank. Overall, we show that the NTK possesses tractable structure that explains GD bias toward task solutions and the emergence of low-rank representations. To enable use of the NTK as a practical metric, we build kpflow, a library relying on randomized matrix-free numerical linear algebra.
Publisher OA PDF
RPNT: Robust Pre-trained Neural Transformer -- A Pathway for Generalized Motor Decoding
arXiv (Cornell University) · 2026-01-25
preprintOpen access
Brain motor decoding aims to interpret and translate neural activity into behaviors. Decoding models should generalize across variations, such as recordings from different brain sites, experimental sessions, behavior types, and subjects, will be critical for real-world applications. Current decoding models only partially address these challenges. In this work, we develop a pretrained neural transformer model, RPNT - Robust Pretrained Neural Transformer, designed to achieve robust generalization through pretraining, which in turn enables effective finetuning for downstream motor decoding tasks. We achieved the proposed RPNT architecture by systematically investigating which transformer building blocks could be suitable for neural spike activity modeling, since components from models developed for other modalities, such as text and images, do not transfer directly to neural data. The final RPNT architecture incorporates three unique enabling components: 1) Multidimensional rotary positional embedding to aggregate experimental metadata such as site coordinates, session ids and behavior types; 2) Context-based attention mechanism via convolution kernels operating on global attention to learn local temporal structures for handling non-stationarity of neural population activity; 3) Robust self-supervised learning objective with stochastic causal masking strategies and contrastive representations. We pretrained two versions of RPNT on distinct datasets that present significant generalization challenges: a) Multi-session, multi-task, and multi-subject microelectrode benchmark; b) Multi-site recordings using high-density Neuropixel 1.0 probes from many cortical locations. After pretraining, we evaluated RPNT generalization on cross-session, cross-type, cross-subject, and cross-site downstream behavior decoding tasks. Our RPNT consistently outperforms the existing decoding models on these tasks.
Publisher DOI
DiffuMask: Diffusion Language Model for Token-level Prompt Pruning
arXiv (Cornell University) · 2026-04-08
articleOpen access
In-Context Learning and Chain-of-Thought prompting improve reasoning in large language models (LLMs). These typically come at the cost of longer, more expensive prompts that may contain redundant information. Prompt compression based on pruning offers a practical solution, yet existing methods rely on sequential token removal which is computationally intensive. We present DiffuMask, a diffusion-based framework integrating hierarchical shot-level and token-level pruning signals, that enables rapid and parallel prompt pruning via iterative mask prediction. DiffuMask substantially accelerates the compression process via masking multiple tokens in each denoising step. It offers tunable control over retained content, preserving essential reasoning context and achieving up to 80\% prompt length reduction. Meanwhile, it maintains or improves accuracy across in-domain, out-of-domain, and cross-model settings. Our results show that DiffuMask provides a generalizable and controllable framework for prompt compression, facilitating faster and more reliable in-context reasoning in LLMs.
Publisher OA PDF
The Global Empirical NTK: Self-Referential Bias and Dimensionality of Gradient Descent Learning
arXiv (Cornell University) · 2026-05-09
preprintOpen access
In training a neural network with gradient descent (GD), each iteration induces a linear operator that governs first-order updates to a model's internal state variables. We define this operator as the Global Empirical Neural Tangent Kernel (NTK). In finite-width networks, the NTK is typically intractable to form, leading prior work to focus on restrictive settings such as tracking outputs only or taking infinite-width limits. Here, we study the structure of the NTK for a range of models. Formulating the model state as the solution to a single global implicit constraint, we derive the NTK as a product of two operators: K, accounting for immediate parameter-to-state interactions, and P, describing internal state-to-state dependencies. For a broad class of weight-based models, including RNNs and transformers, we prove a universal Kronecker-core theorem showing that K admits an exact, computable form given by the Gram matrix of weight-site variables. This core structure reveals that the NTK is structurally bottlenecked, constraining its effective rank and giving rise to a self-referential bias whereby GD preferentially learns within dominant modes of joint hidden and input activity. For recurrent models, we examine the spectrum of the NTK and show when it is biased and low-rank in space or time under the proposed decomposition. We further demonstrate that model dynamics at initialization bias the NTK, restricting learning and preventing task components from being learned effectively. Finally, we show that the NTK associated with a self-attention transformer is likewise structurally constrained to be low-rank. Overall, we show that the NTK possesses tractable structure that explains GD bias toward task solutions and the emergence of low-rank representations. To enable use of the NTK as a practical metric, we build kpflow, a library relying on randomized matrix-free numerical linear algebra.
Publisher DOI
RPNT: Robust Pre-trained Neural Transformer -- A Pathway for Generalized Motor Decoding
ArXiv.org · 2026-01-25
articleOpen access
Brain motor decoding aims to interpret and translate neural activity into behaviors. Decoding models should generalize across variations, such as recordings from different brain sites, experimental sessions, behavior types, and subjects, will be critical for real-world applications. Current decoding models only partially address these challenges. In this work, we develop a pretrained neural transformer model, RPNT - Robust Pretrained Neural Transformer, designed to achieve robust generalization through pretraining, which in turn enables effective finetuning for downstream motor decoding tasks. We achieved the proposed RPNT architecture by systematically investigating which transformer building blocks could be suitable for neural spike activity modeling, since components from models developed for other modalities, such as text and images, do not transfer directly to neural data. The final RPNT architecture incorporates three unique enabling components: 1) Multidimensional rotary positional embedding to aggregate experimental metadata such as site coordinates, session ids and behavior types; 2) Context-based attention mechanism via convolution kernels operating on global attention to learn local temporal structures for handling non-stationarity of neural population activity; 3) Robust self-supervised learning objective with stochastic causal masking strategies and contrastive representations. We pretrained two versions of RPNT on distinct datasets that present significant generalization challenges: a) Multi-session, multi-task, and multi-subject microelectrode benchmark; b) Multi-site recordings using high-density Neuropixel 1.0 probes from many cortical locations. After pretraining, we evaluated RPNT generalization on cross-session, cross-type, cross-subject, and cross-site downstream behavior decoding tasks. Our RPNT consistently outperforms the existing decoding models on these tasks.
Publisher OA PDF
ElectroPhysiomeGAN: Generation of Biophysical Neuron Model Parameters from Recorded Electrophysiological Responses
eLife · 2025-08-21
articleOpen accessSenior author
Abstract Recent advances in connectomics, biophysics, and neuronal electrophysiology warrant modeling of neurons with further details in both network interaction and cellular dynamics. Such models may be referred to as ElectroPhysiome, as they incorporate the connectome and individual neuron electrophysiology to simulate neuronal activities. The nervous system of C. elegans is considered a viable framework for such ElectroPhysiome studies due to advances in connectomics of its somatic nervous system and electrophysiological recordings of neuron responses. In order to achieve a simulated ElectroPhysiome, the set of parameters involved in modeling individual neurons need to be estimated from electrophysiological recordings. Here, we address this challenge by developing a deep generative estimation method called ElectroPhysiomeGAN (EP-GAN), which once trained, can instantly generate parameters associated with the Hodgkin-Huxley neuron model (HH-model) for multiple neurons with graded potential response. The method combines Generative Adversarial Network (GAN) architecture with Recurrent Neural Network (RNN) Encoder and can generate an extensive number of parameters (>170) given the neuron’s membrane potential responses and steady-state current profiles. We validate our method by estimating HH-model parameters for 200 simulated neurons with graded membrane potential followed by 9 experimentally recorded neurons (where 6 of them newly recorded) in the nervous system of C. elegans. Comparison of EP-GAN with existing estimation methods shows EP-GAN advantage in the accuracy of estimated parameters and inference speed for both small and large number of parameters being inferred. In addition, the architecture of EP-GAN permits input with arbitrary clamping protocols, allowing inference of parameters even when partial membrane potential and steady-state currents profile are given as inputs. EP-GAN is designed to leverage the generative capability of GAN to align with the dynamical structure of HH-model, and thus able to achieve such performance.
Publisher DOI
Modular integration of neural connectomics, dynamics and biomechanics for identification of behavioral sensorimotor pathways in Caenorhabditis elegans.
PubMed · 2025-06-03
preprintOpen accessSenior author
Computational approaches which emulate in-vivo nervous system are needed to investigate mechanisms of the brain to orchestrate behavior. Such approaches must integrate a series of biophysical models encompassing the nervous system, muscles, biomechanics to allow observing the system in its entirety while supporting model variations. Here we develop modWorm: a modular modeling framework for the nematode C. elegans. modWorm allows for construction of a model as an integrated series of configurable, exchangeable modules each describing specific biophysical processes across different modalities. Utilizing modWorm, we propose a base neuro-mechanical model for C. elegans built upon the complete connectome. The model integrates a series of 7 modules: i) intra-cellular dynamics, ii) electrical and iii) chemical extra-cellular neural dynamics, iv) translation of neural activity to muscle calcium dynamics, v) muscle calcium dynamics to muscle forces, vi) muscle forces to body postures and vii) proprioceptive feedback. We validate the base model by in-silico injection of constant currents into neurons known to be associated with locomotion behaviors and by applying external forces to the body. Applications of in-silico neural stimuli experimentally known to modulate locomotion show that the model can recapitulate natural behavioral responses such as forward and backward locomotion as well as mid-locomotion responses such as avoidance and turns. Furthermore, through in-silico ablation surveys, the model can infer novel neural circuits involved in sensorimotor behaviors. To further dissect mechanisms of locomotion, we utilize modWorm to introduce empirical based model variations and model optimizations to elucidate their effects on simulated locomotion. Our results show that modWorm can be utilized to identify neural circuits which control, mediate and generate natural behavior.
Publisher OA PDF

Recent grants

Inference of Network Dynamics and Architecture in Neural Systems with Data-Driven Methods
NSF · $879k · 2014–2020
CRCNS Research Proposal: Collaborative Research: Electrophysiome: comprehensive recording and integrated modeling of the C. elegans nervous system
NSF · $318k · 2021–2025

Frequent coauthors

J. Nathan Kutz
31 shared
Kun Su
Xi'an University of Architecture and Technology
19 shared
Xiulong Liu
Tianjin University
16 shared
Jinlin Xiang
11 shared
Jeffrey A. Riffell
University of Washington
11 shared
Jimin Kim
University of Washington
10 shared
Edwin Ding
Azusa Pacific University
7 shared
Julia A. Santos
Hospital das Clínicas da Universidade Federal de Minas Gerais
6 shared

Labs

Eli Shlizerman LabPI

Education

B.S., Mathematics and Computer Science
Weizmann Institute of Science
2002
M.S., Applied Mathematics
Weizmann Institute of Science
2005
Ph.D., Applied Mathematics
Weizmann Institute of Science
2009

Awards & honors

NSF Funds A3D3 Institute to Integrate AI into Scientific Res…

Resume-aware match score
Save to shortlist
AI-drafted outreach

See your match with Eli Shlizerman

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

Join the waitlist How it works

Free to start
No credit card
30-second signup

Find professors who actually fit you