Shinjae Yoo

· Assistant Professor

Stony Brook University · Psychology

Active 1995–2024

h-index29

Citations4.4k

Papers292181 last 5y

Funding—

Faculty page Lab page

See your match with Shinjae Yoo — sign in to PhdFit.Sign in

About

Dr. Shinjae Yoo is a Computational Scientist in the Computer Science and Math of Computational Science Initiative at Brookhaven National Laboratory. His research interests include Large Scale Scientific Data Mining, Text Mining, and Social Media Analysis. He holds a Ph.D. and a Master's degree from Carnegie Mellon University, a second Master's degree from Seoul National University in Korea, and a Bachelor's degree from Soong-sil University in Korea. His academic background and research focus are centered on computational science and data analysis.

Research topics

Artificial Intelligence
Computer Science
Medicine
Biology
Machine Learning
Computational science
Theoretical computer science
Parallel computing
Clinical psychology
Mathematics
Econometrics
Geography
Bioinformatics
Statistical physics
Psychology
Algorithm
Cartography
Chemistry
Astrophysics
Meteorology
Medical emergency
Computational chemistry
Environmental health
Physics

Selected publications

Quantum Long Short-Term Memory
ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) · 2022 · 219 citations
- Computer Science
- Computer Science
- Artificial Intelligence
Long short-term memory (LSTM) is a kind of recurrent neural networks (RNN) for sequence and temporal dependency data modeling and its effectiveness has been extensively established. In this work, we propose a hybrid quantum-classical model of LSTM, which we dub QLSTM. We demonstrate that the proposed model successfully learns several kinds of temporal data. In particular, we show that for certain testing cases, this quantum version of LSTM converges faster, or equivalently, reaches a better accuracy, than its classical counterpart. Due to the variational nature of our approach, the requirements on qubit counts and circuit depth are eased, and our work thus paves the way toward implementing machine learning algorithms for sequence modeling such as natural language processing, speech recognition on noisy intermediate-scale quantum (NISQ) devices.
DOI
Identification of Novel, Replicable Genetic Risk Loci for Suicidal Thoughts and Behaviors Among US Military Veterans
JAMA Psychiatry · 2022 · 55 citations
- Psychology
- Psychiatry
- Clinical psychology
Importance: Suicide is a leading cause of death; however, the molecular genetic basis of suicidal thoughts and behaviors (SITB) remains unknown. Objective: To identify novel, replicable genomic risk loci for SITB. Design, Setting, and Participants: This genome-wide association study included 633 778 US military veterans with and without SITB, as identified through electronic health records. GWAS was performed separately by ancestry, controlling for sex, age, and genetic substructure. Cross-ancestry risk loci were identified through meta-analysis. Study enrollment began in 2011 and is ongoing. Data were analyzed from November 2021 to August 2022. Main Outcome and Measures: SITB. Results: A total of 633 778 US military veterans were included in the analysis (57 152 [9%] female; 121 118 [19.1%] African ancestry, 8285 [1.3%] Asian ancestry, 452 767 [71.4%] European ancestry, and 51 608 [8.1%] Hispanic ancestry), including 121 211 individuals with SITB (19.1%). Meta-analysis identified more than 200 GWS (P < 5 × 10-8) cross-ancestry risk single-nucleotide variants for SITB concentrated in 7 regions on chromosomes 2, 6, 9, 11, 14, 16, and 18. Top single-nucleotide variants were largely intronic in nature; 5 were independently replicated in ISGC, including rs6557168 in ESR1, rs12808482 in DRD2, rs77641763 in EXD3, rs10671545 in DCC, and rs36006172 in TRAF3. Associations for FBXL19 and AC018880.2 were not replicated. Gene-based analyses implicated 24 additional GWS cross-ancestry risk genes, including FURIN, TSNARE1, and the NCAM1-TTC12-ANKK1-DRD2 gene cluster. Cross-ancestry enrichment analyses revealed significant enrichment for expression in brain and pituitary tissue, synapse and ubiquitination processes, amphetamine addiction, parathyroid hormone synthesis, axon guidance, and dopaminergic pathways. Seven other unique European ancestry-specific GWS loci were identified, 2 of which (POM121L2 and METTL15/LINC02758) were replicated. Two additional GWS ancestry-specific loci were identified within the African ancestry (PET112/GATB) and Hispanic ancestry (intergenic locus on chromosome 4) subsets, both of which were replicated. No GWS loci were identified within the Asian ancestry subset; however, significant enrichment was observed for axon guidance, cyclic adenosine monophosphate signaling, focal adhesion, glutamatergic synapse, and oxytocin signaling pathways across all ancestries. Within the European ancestry subset, genetic correlations (r > 0.75) were observed between the SITB phenotype and a suicide attempt-only phenotype, depression, and posttraumatic stress disorder. Additionally, polygenic risk score analyses revealed that the Million Veteran Program polygenic risk score had nominally significant main effects in 2 independent samples of veterans of European and African ancestry. Conclusions and Relevance: The findings of this analysis may advance understanding of the molecular genetic basis of SITB and provide evidence for ESR1, DRD2, TRAF3, and DCC as cross-ancestry candidate risk genes. More work is needed to replicate these findings and to determine if and how these genes might impact clinical care.
Publisher OA PDF DOI
Use of physics to improve solar forecast: Physics-informed persistence models for simultaneously forecasting GHI, DNI, and DHI
Solar Energy · 2021 · 39 citations
- Physics
- Meteorology
- Statistical physics
DOI
Supercomputer-Based Ensemble Docking Drug Discovery Pipeline with Application to Covid-19
Journal of Chemical Information and Modeling · 2020 · 188 citations
- Computer Science
- Artificial Intelligence
- Computer Science
We present a supercomputer-driven pipeline for in silico drug discovery using enhanced sampling molecular dynamics (MD) and ensemble docking. Ensemble docking makes use of MD results by docking compound databases into representative protein binding-site conformations, thus taking into account the dynamic properties of the binding sites. We also describe preliminary results obtained for 24 systems involving eight proteins of the proteome of SARS-CoV-2. The MD involves temperature replica exchange enhanced sampling, making use of massively parallel supercomputing to quickly sample the configurational space of protein drug targets. Using the Summit supercomputer at the Oak Ridge National Laboratory, more than 1 ms of enhanced sampling MD can be generated per day. We have ensemble docked repurposing databases to 10 configurations of each of the 24 SARS-CoV-2 systems using AutoDock Vina. Comparison to experiment demonstrates remarkably high hit rates for the top scoring tranches of compounds identified by our ensemble approach. We also demonstrate that, using Autodock-GPU on Summit, it is possible to perform exhaustive docking of one billion compounds in under 24 h. Finally, we discuss preliminary results and planned improvements to the pipeline, including the use of quantum mechanical (QM), machine learning, and artificial intelligence (AI) methods to cluster MD trajectories and rescore docking poses.
DOI
Machine learning prediction of incidence of Alzheimer’s disease using large-scale administrative health data
npj Digital Medicine · 2020 · 142 citations
- Machine Learning
- Artificial Intelligence
- Computer Science
= 2026). We trained and validated random forest, support vector machine and logistic regression to predict incident AD in 1, 2, 3, and 4 subsequent years. For predicting future incidence of AD in balanced samples (bootstrapping), the machine learning models showed reasonable performance in 1-year prediction with AUC of 0.775 and 0.759, based on "definite AD" and "probable AD" outcomes, respectively; in 2-year, 0.730 and 0.693; in 3-year, 0.677 and 0.644; in 4-year, 0.725 and 0.683. The results were similar when the entire (unbalanced) samples were used. Important clinical features selected in logistic regression included hemoglobin level, age and urine protein level. This study may shed a light on the utility of the data-driven machine learning model based on large-scale administrative health data in AD risk prediction, which may enable better selection of individuals at risk for AD in clinical trials or early detection in clinical settings.
DOI

Frequent coauthors

Jiook Cha
104 shared
Chenxiao Xu
96 shared
Hyoung Seop Kim
National Health Insurance Service
95 shared
Yaakov Stern
Columbia University Irving Medical Center
89 shared
H. Eric Tseng
87 shared
Ji Hwan Park
86 shared
Yun Wang
Beijing Anding Hospital
86 shared
Benedetta Bigio
New York University
81 shared

Education

Ph.D., Computer Science
University of California, Los Angeles
2009
M.S., Computer Science
University of California, Los Angeles
2006
B.S., Computer Science
University of California, Los Angeles
2004

Similar researchers at Stony Brook University

Resume-aware match score
Save to shortlist
AI-drafted outreach

See your match with Shinjae Yoo

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

Join the waitlist How it works

Free to start
No credit card
30-second signup

Find professors who actually fit you