
Tengfei Ma
· Assistant ProfessorVerifiedStony Brook University · Psychology
Active 2010–2026
About
Tengfei Ma is an Assistant Professor in the Department of Biomedical Informatics at Stony Brook University. His research focuses on machine learning, natural language processing, and artificial intelligence applications in healthcare. He is based in MART 7M-0810, Stony Brook, NY, and can be contacted at Tengfei.Ma@stonybrookmedicine.edu.
Research topics
- Computer Science
- Artificial Intelligence
- Machine Learning
- Data Mining
- Theoretical computer science
- Biology
- Bioinformatics
- Mathematics
- Pharmacology
- Geometry
- Computational biology
- Combinatorics
- Internal medicine
- Medicine
Selected publications
Robotics and Autonomous Systems · 2026-05-03
articleWasserstein Graph Neural Networks for Graphs With Missing Attributes
IEEE Transactions on Pattern Analysis and Machine Intelligence · 2025-05-08 · 1 citations
articleMissing node attributes pose a common problem in real-world graphs, impacting the performance of graph neural networks' representation learning. Existing GNNs often struggle to effectively leverage incomplete attribute information, as they are not specifically designed for graphs with missing attributes. To address this issue, we propose a novel node representation learning framework called Wasserstein Graph Neural Network (WGNN). Our approach aims to maximize the utility of limited observed attribute information and account for uncertainty caused by missing values. We achieve this by representing nodes as low-dimensional distributions obtained through attribute matrix decomposition. Additionally, we enhance representation expressiveness by introducing a unique message-passing schema that aggregates distributional information from neighboring nodes in the Wasserstein space. We evaluate the performance of WGNN in node classification tasks using both synthetic and real-world datasets under two missing-attribute scenarios. Moreover, we demonstrate the applicability of WGNN in recovering missing values and tackling matrix completion problems, specifically in graphs involving users and items. Experimental results on both tasks convincingly demonstrate the superiority of our proposed method.
Dual-Pathway Fusion of EHRs and Knowledge Graphs for Predicting Unseen Drug-Drug Interactions
ArXiv.org · 2025-11-10
preprintOpen accessSenior authorDrug-drug interactions (DDIs) remain a major source of preventable harm, and many clinically important mechanisms are still unknown. Existing models either rely on pharmacologic knowledge graphs (KGs), which fail on unseen drugs, or on electronic health records (EHRs), which are noisy, temporal, and site-dependent. We introduce, to our knowledge, the first system that conditions KG relation scoring on patient-level EHR context and distills that reasoning into an EHR-only model for zero-shot inference. A fusion "Teacher" learns mechanism-specific relations for drug pairs represented in both sources, while a distilled "Student" generalizes to new or rarely used drugs without KG access at inference. Both operate under a shared ontology (set) of pharmacologic mechanisms (drug relations) to produce interpretable, auditable alerts rather than opaque risk scores. Trained on a multi-institution EHR corpus paired with a curated DrugBank DDI graph, and evaluated using a clinically aligned, decision-focused protocol with leakage-safe negatives that avoid artificially easy pairs, the system maintains precision across multi-institutuion test data, produces mechanism-specific, clinically consistent predictions, reduces false alerts (higher precision) at comparable overall detection performance (F1), and misses fewer true interactions compared to prior methods. Case studies further show zero-shot identification of clinically recognized CYP-mediated and pharmacodynamic mechanisms for drugs absent from the KG, supporting real-world use in clinical decision support and pharmacovigilance.
Bank Credit and Trade Credit: A New Perspective from Bank Regulatory Penalties
SSRN Electronic Journal · 2025-01-01
preprintOpen accessUncovering LLM-Generated Code: A Zero-Shot Synthetic Code Detector via Code Rewriting
Proceedings of the AAAI Conference on Artificial Intelligence · 2025-04-11 · 6 citations
articleOpen accessLarge Language Models (LLMs) have demonstrated remarkable proficiency in generating code. However, the misuse of LLM-generated (synthetic) code has raised concerns in both educational and industrial contexts, underscoring the urgent need for synthetic code detectors. Existing methods for detecting synthetic content are primarily designed for general text and struggle with code due to the unique grammatical structure of programming languages and the presence of numerous ``low-entropy'' tokens. Building on this, our work proposes a novel zero-shot synthetic code detector based on the similarity between the original code and its LLM-rewritten variants. Our method is based on the observation that differences between LLM-rewritten and original code tend to be smaller when the original code is synthetic. We utilize self-supervised contrastive learning to train a code similarity model and evaluate our approach on two synthetic code detection benchmarks. Our results demonstrate a significant improvement over existing SOTA synthetic content detectors, delivering notable gains in both performance and robustness on the APPS and MBPP benchmarks.
Bank Credit and Trade Credit: A New Perspective from Bank Regulatory Penalties
SSRN Electronic Journal · 2025-01-01
preprintOpen accessExpert Systems with Applications · 2025-09-02
articlePredicting Drug–Drug Interaction via Dual-Drug Visual Representation
Journal of Chemical Information and Modeling · 2025-09-26 · 2 citations
articleCorrespondingDrug-drug interaction (DDI) prediction is essential for ensuring medication safety and therapeutic efficacy. While existing models often rely on chemical descriptors or molecular graphs, they tend to overlook the rich spatial and structural cues embedded in visual molecules. To address this issue, we propose DDVR-DDI, a novel vision-based framework that predicts DDIs by encoding drug pairs as a single fused molecular image, enabling direct modeling of their potential interaction interface. To enhance representation learning of visual drug pairs, we introduce a two-stage self-supervised pretraining strategy: a position-invariant contrastive task improves understanding of certain drug pairs in different spatial variations, while a jigsaw puzzle task encourages fine-grained structural understanding. Additionally, we develop a multiexpert voting mechanism, where multiple models analyze distinct augmented views of each drug pair to boost prediction accuracy and stability through ensemble inference. Extensive experiments on benchmark DDI data sets show that our model achieves state-of-the-art performance. To further interpret its predictions, we employ Grad-CAM visualizations and conduct multiple experiments to validate the stability and interpretability of the model; furthermore, we conduct a case study on Ritonavir inhibition of CYP3A, revealing that our model consistently highlights chemically significant substructures. These findings underscore the potential of image-based modeling for both accurate prediction and mechanistic insight in drug interaction research.
Study on flow control strategy of a fuel valve directly-driven by two-phase hybrid stepper motor
Flow Measurement and Instrumentation · 2025-07-16 · 1 citations
articleSenior authorEnhancing Graph Representation Learning with Localized Topological Features
ArXiv.org · 2025-01-15
preprintOpen accessRepresentation learning on graphs is a fundamental problem that can be crucial in various tasks. Graph neural networks, the dominant approach for graph representation learning, are limited in their representation power. Therefore, it can be beneficial to explicitly extract and incorporate high-order topological and geometric information into these models. In this paper, we propose a principled approach to extract the rich connectivity information of graphs based on the theory of persistent homology. Our method utilizes the topological features to enhance the representation learning of graph neural networks and achieve state-of-the-art performance on various node classification and link prediction benchmarks. We also explore the option of end-to-end learning of the topological features, i.e., treating topological computation as a differentiable operator during learning. Our theoretical analysis and empirical study provide insights and potential guidelines for employing topological features in graph learning tasks.
Frequent coauthors
- 27 shared
Cao Xiao
- 26 shared
Lingfei Wu
- 22 shared
Shouling Ji
Zhejiang University
- 17 shared
Jimeng Sun
- 16 shared
Swarnadeep Saha
- 16 shared
V. Lakshma Reddy
University of Sydney
- 16 shared
Rishi Arora
- 16 shared
Chul Sung
Labs
Education
- 2008
Ph.D., Computer Science
University of California, Los Angeles
- 2003
M.S., Computer Science
University of California, Los Angeles
- 2001
B.S., Computer Science
University of Science and Technology of China
Awards & honors
- ISWC 2021 Best Paper Award of Research Track
- IBM Outstanding Research Accomplishment 2019
- IBM Outstanding Research Accomplishment 2022
- Resume-aware match score
- Save to shortlist
- AI-drafted outreach
See your match with Tengfei Ma
PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.
- Free to start
- No credit card
- 30-second signup