About
Smita Krishnaswamy's research focuses on machine learning and deep learning methods that incorporate signal processing, data geometry, and topology. Her work aims to enable exploratory analysis, scientific inference, and prediction from large biomedical datasets. She leads the Krishnaswamy Lab at Yale University, where these advanced computational techniques are applied to understand complex biomedical data.
Research topics
- Biology
- Medicine
- Genetics
- Immunology
- Neuroscience
- Computer Science
- Computational biology
- Cell biology
- Pathology
- Cancer research
- Artificial Intelligence
- Biological system
- Mathematics
- Anatomy
- Biochemistry
- Ophthalmology
- Algorithm
- Physics
- Internal medicine
- Endocrinology
Selected publications
Modeling uniquely human gene regulatory function via targeted humanization of the mouse genome
Nature Communications · 29 citations
- Biology
- Genetics
- Cell biology
Abstract The evolution of uniquely human traits likely entailed changes in developmental gene regulation. Human Accelerated Regions (HARs), which include transcriptional enhancers harboring a significant excess of human-specific sequence changes, are leading candidates for driving gene regulatory modifications in human development. However, insight into whether HARs alter the level, distribution, and timing of endogenous gene expression remains limited. We examined the role of the HAR HACNS1 (HAR2) in human evolution by interrogating its molecular functions in a genetically humanized mouse model. We find that HACNS1 maintains its human-specific enhancer activity in the mouse embryo and modifies expression of Gbx2, which encodes a transcription factor, during limb development. Using single-cell RNA-sequencing, we demonstrate that Gbx2 is upregulated in the limb chondrogenic mesenchyme of HACNS1 homozygous embryos, supporting that HACNS1 alters gene expression in cell types involved in skeletal patterning. Our findings illustrate that humanized mouse models provide mechanistic insight into how HARs modified gene expression in human evolution.
Beta cell-derived cholecystokinin drives obesity-associated pancreatic adenocarcinoma development
Nature Communications · 2026-02-27
articleOpen accessCorrespondingPancreatic endocrine-exocrine crosstalk plays a key role in normal physiology and disease and can be altered by host metabolic states, such as obesity. Classically, endocrine islet beta (β) cell secretion of insulin is thought to promote the development of obesity-associated pancreatic adenocarcinoma (PDAC), an exocrine cell-derived tumor. Here, we show that β cell expression of the peptide hormone cholecystokinin (CCK) is necessary and sufficient for obesity-associated PDAC progression in mice and that CCK expression - rather than insulin - correlates strongly with enhanced tumorigenesis. Single-cell RNA-sequencing, in silico latent-space archetypal and trajectory analysis, and experimental lineage tracing in vivo reveal that obesity induces the expansion of postnatal immature β cells, which adapt to express CCK via stress-responsive JNK/cJun signaling. Finally, obesity perturbs CCK-dependent peri-islet exocrine cell transcriptional states and enhances islet-proximal tumor formation. These results define endocrine-exocrine CCK signaling as a bona fide driver of obesity-associated PDAC development and uncover avenues to target the endocrine pancreas to subvert exocrine tumorigenesis.
CTR-LoRA: Curvature-Aware and Trust-Region Guided Low-Rank Adaptation for Large Language Models
2026-04-21
articleOpen accessParameter-efficient fine-tuning (PEFT) has become the standard approach for adapting large language models under limited compute and memory budgets. Although previous methods improve efficiency through low-rank updates, quantization, or heuristic budget reallocation, they often decouple the allocation of capacity from the way updates evolve during training. In this work, we introduce CTR-LoRA, a framework guided by curvature trust region that integrates rank scheduling with stability-aware optimization. CTR-LoRA allocates parameters based on marginal utility derived from lightweight second-order proxies and constrains updates using a Fisher/Hessian-metric trust region. Experiments on multiple open-source backbones (7B-13B), evaluated on both in-distribution and out-of-distribution benchmarks, show consistent improvements over strong PEFT baselines. In addition to increased accuracy, CTR-LoRA enhances training stability, reduces memory requirements, and achieves higher throughput, positioning it on the Pareto frontier of performance and efficiency. These results highlight a principled path toward more robust and deployable PEFT.
PubMed · 2026-05-08
articleSenior authorGenerating property-optimized mRNA sequences is central to applications such as vaccine design and protein replacement therapy, but remains challenging due to limited data, complex sequence-function relationships, and the narrow space of biologically viable sequences. Generative methods that drift away from the data manifold can yield sequences that fail to fold, translate poorly, or are otherwise nonfunctional. We present RNAGenScape, a property-guided manifold Langevin dynamics framework for mRNA sequence generation that operates directly on a learned manifold of real data. By performing iterative local optimization constrained to this manifold, RNAGenScape preserves biological viability, accesses reliable guidance, and avoids excursions into nonfunctional regions of the ambient sequence space. The framework integrates three components: (1) an autoencoder jointly trained with a property predictor to learn a property-organized latent manifold, (2) a denoising autoencoder that projects updates back onto the manifold, and (3) a property-guided Langevin dynamics procedure that performs optimization along the manifold. Across three real-world mRNA datasets spanning two orders of magnitude in size, RNAGenScape increases median property gain by up to 148% and success rate by up to 30% while ensuring biological viability of generated sequences, and achieves competitive inference efficiency relative to existing generative approaches.
Cancer Research · 2026-04-03
articleAbstract Pancreatic ductal adenocarcinoma (PDAC) is the third-leading cause of cancer death in the United States with a 5-year survival rate of ∼13%. Obesity is a key PDAC risk factor associated with increased incidence and decreased survival, but the mechanisms by which obesity promotes PDAC development and progression remain unclear. To study how obesity drives PDAC, our lab developed a novel genetically engineered mouse model of obesity-associated PDAC and found that obese mice had significantly increased disease burden relative to lean controls, a phenotype which was abrogated by early induced weight loss. Molecular analyses of the pancreata from obese mice showed marked upregulation of the peptide hormone cholecystokinin (CCK) in β cells of the endocrine pancreas due to stress-responsive JNK/cJun signaling. CCK canonically promotes digestive enzyme release in exocrine acinar cells, the putative PDAC cell-of-origin, and acts as a survival factor in endocrine β cells under conditions of increased insulin demand, such as obesity. Exogenous CCK stimulates acinar cell proliferation and ductal metaplasia, early prerequisite steps in PDAC development. Strikingly, we found that β cell CCK overexpression was sufficient to enhance exocrine tumorigenesis in lean mice, phenocopying the effects of obesity and validating β cell CCK as an independent driver of PDAC development. Conversely, pancreas-specific CCK knockout significantly abrogated exocrine tumorigenesis in obese mice similar to levels seen in lean mice. Critically, tumor burden was significantly positively associated with pancreatic CCK expression and negatively correlated with endogenous insulin production, suggesting that CCK, rather than insulin, drives obesity-associated tumorigenesis. Finally, treatment of obese mice with GLP-1 receptor agonists (GLP-1RAs), which augment glucose-stimulated insulin secretion and improve β cell health, enhanced β cell function and significantly decreased pancreatic CCK expression. Together, this work has established endocrine-exocrine CCK - rather than insulin - as a critical previously unappreciated mediator of obesity-driven PDAC and enabled the identification of novel translational approaches, including GLP-1RAs, to intercept obesity-associated PDAC development. Citation Format: Daniel C. McQuaid, Cathy C. Garcia, Aarthi Venkat, Christian F. Ruiz, Christy Zheng, Smita Krishnaswamy, Mandar Deepak Muzumdar. Dysregulation of the islet hormone cholecystokinin drives obesity-associated pancreatic cancer [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2026; Part 1 (Regular Abstracts); 2026 Apr 17-22; San Diego, CA. Philadelphia (PA): AACR; Cancer Res 2026;86(7 Suppl):Abstract nr 960.
Self-Supervised Visual Prompting for Cross-Domain Road Damage Detection
2026-03-06
articleOpen accessThe deployment of automated pavement defect detection is often hindered by poor cross-domain generalization. Supervised detectors achieve strong in-domain accuracy but require costly re-annotation for new environments, while standard self-supervised methods capture generic features and remain vulnerable to domain shift. We propose PROBE, a self-supervised framework that visually probes target domains without labels. PROBE introduces a Self-supervised Prompt Enhancement Module (SPEM), which derives defect-aware prompts from unlabeled target data to guide a frozen ViT backbone, and a Domain-Aware Prompt Alignment (DAPA) objective, which aligns prompt-conditioned source and target representations. Experiments on four challenging benchmarks show that PROBE consistently outperforms strong supervised, self-supervised, and adaptation baselines, achieving robust zero-shot transfer, improved resilience to domain variations, and high data efficiency in few-shot adaptation. These results highlight self-supervised prompting as a practical direction for building scalable and adaptive visual inspection systems. Source code is publicly available: https://github.com/xixiaouab/PROBE/tree/main
2025-10-06 · 1 citations
articleOpen access<div>Abstract<p>Identifying functionally important cell states and structure within heterogeneous tumors remains a significant biological and computational challenge. Current clustering- or trajectory-based models are ill-equipped to address the notion that cancer cells reside along a phenotypic continuum. We present Archetypal Analysis network (AAnet), a neural network that learns archetypal states within a phenotypic continuum in single-cell data. Unlike traditional archetypal analysis, AAnet learns archetypes (AT) in a simplex-shaped neural network latent space. Using preclinical and clinical models of breast cancer, AAnet resolves distinct cell states and processes, including cell proliferation, hypoxia, metabolism, and immune interactions. Primary tumor ATs are recapitulated in matched liver, lung, and lymph node metastases. Spatial transcriptomics reveals archetypal organization within the tumor and intra-archetypal mirroring between cancer and adjacent stromal cells. AAnet identifies GLUT3 within the hypoxic AT that proves critical for tumor growth and metastasis. AAnet is a powerful tool, capturing complex, functional cell states from multimodal data.</p>Significance:<p>Defining critical cell states among cells that reside along a phenotypic continuum is a current biological and computational challenge. In this study, we present AAnet, a neural network that learns archetypal cell states of cancer cells. AAnet defines discrete spatially localized ATs that resolve intratumoral heterogeneity.</p></div>
2025-08-21
peer-reviewOpen accessVolume electron microscopy (vEM) datasets such as those generated for connectome studies allow nanoscale quantifications and comparisons of the cell biological features underpinning circuit architectures. Quantifying cell biological relationships in the connectome yields rich, multidimensional datasets that benefit from data science approaches, including dimensionality reduction and integrated graphical representations of neuronal relationships. We developed NeuroSC (also known as NeuroSCAN) an open source online platform that bridges sophisticated graph analytics from data science approaches with the underlying cell biological features in the connectome. We analyze a series of published C. elegans brain neuropils and demonstrate how these integrated representations of neuronal relationships facilitate comparisons across connectomes, catalyzing new insights into the structure-function relationships of the circuits and their changes during development. NeuroSC is designed for intuitive examination and comparisons across connectomes, enabling synthesis of knowledge from high-level abstractions of neuronal relationships derived from data science techniques to the detailed identification of the cell biological features underpinning these abstractions.
In vivo differentiation of embryonic cells devoid of key reprogramming factors
Cell Reports · 2025-10-30 · 1 citations
articleOpen accessEmbryonic cell differentiation depends on reprogramming of the oocyte and sperm nucleus into a transient totipotent state. In zebrafish, this coincides with genome activation, which is regulated by the pioneer factors Nanog, Pou5f3, and Sox19b (NPS). Here, we investigate the role of NPS in developmental reprogramming and differentiation by analyzing the fate of NPS mutant cells in a wild-type embryo using single-cell RNA-seq. We find that many cells fail to activate transcription or undergo cell death, while others acquire gene expression profiles that resemble germ cells, neural progenitors, and motoneuron states. These cells achieve intermediate transcriptional states, revealing the essential role of NPS in coordinating nuclear and cytoplasmic reprogramming and preventing the premature activation of lineage-specific differentiation programs. These results demonstrate that most developmental programs require developmental reprogramming by NPS, yet some cells can bypass transient totipotency to achieve intermediate developmental states resembling wild-type states in vivo.
Manifold filter-combine networks
Sampling Theory Signal Processing and Data Analysis · 2025-08-05
articleOpen accessAbstract In order to better understand manifold neural networks (MNNs), we introduce Manifold Filter-Combine Networks (MFCNs). Our filter-combine framework parallels the popular aggregate-combine paradigm for graph neural networks (GNNs) and naturally suggests many interesting families of MNNs which can be interpreted as manifold analogues of various popular GNNs. We propose a method for implementing MFCNs on high-dimensional point clouds that relies on approximating an underlying manifold by a sparse graph. We then prove that our method is consistent in the sense that it converges to a continuum limit as the number of data points tends to infinity, and we numerically demonstrate its effectiveness on real-world and synthetic data sets.
Recent grants
NIH · $2.8M · 2020–2025
NIH · $2.0M · 2019–2025
CAREER: Deep representation learning for exploration and inference in biomedical data
NSF · $586k · 2021–2026
Multiscale data geometric networks for learning representations and dynamics of biological systems
NSF · $498k · 2023–2026
Frequent coauthors
- 153 shared
Guy Wolf
- 76 shared
Alexander Tong
- 51 shared
Dennis Shung
- 48 shared
David van Dijk
Yale University
- 43 shared
Daniel B. Burkhardt
- 41 shared
Manik Kuchroo
Yale University
- 39 shared
Ramesh Batra
Yale University
- 38 shared
Jessie Huang
Drexel University
Education
- 2008
Ph.D. , Electrical Engineering and Computer Science
University of Michigan
- Resume-aware match score
- Save to shortlist
- AI-drafted outreach
See your match with Smita Krishnaswamy
PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.
- Free to start
- No credit card
- 30-second signup