
Shinjae Yoo
· Assistant ProfessorStony Brook University · Psychology
Active 1995–2024
About
Dr. Shinjae Yoo is a Computational Scientist in the Computer Science and Math of Computational Science Initiative at Brookhaven National Laboratory. His research interests include Large Scale Scientific Data Mining, Text Mining, and Social Media Analysis. He holds a Ph.D. and a Master's degree from Carnegie Mellon University, a second Master's degree from Seoul National University in Korea, and a Bachelor's degree from Soong-sil University in Korea. His academic background and research focus are centered on computational science and data analysis.
Research topics
- Artificial Intelligence
- Computer Science
- Medicine
- Biology
- Machine Learning
- Computational science
- Theoretical computer science
- Parallel computing
- Clinical psychology
- Mathematics
- Econometrics
- Geography
- Bioinformatics
- Statistical physics
- Psychology
- Algorithm
- Cartography
- Chemistry
- Astrophysics
- Meteorology
- Medical emergency
- Computational chemistry
- Environmental health
- Physics
Selected publications
Quantum Long Short-Term Memory
ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) · 2022 · 219 citations
- Computer Science
- Computer Science
- Artificial Intelligence
Long short-term memory (LSTM) is a kind of recurrent neural networks (RNN) for sequence and temporal dependency data modeling and its effectiveness has been extensively established. In this work, we propose a hybrid quantum-classical model of LSTM, which we dub QLSTM. We demonstrate that the proposed model successfully learns several kinds of temporal data. In particular, we show that for certain testing cases, this quantum version of LSTM converges faster, or equivalently, reaches a better accuracy, than its classical counterpart. Due to the variational nature of our approach, the requirements on qubit counts and circuit depth are eased, and our work thus paves the way toward implementing machine learning algorithms for sequence modeling such as natural language processing, speech recognition on noisy intermediate-scale quantum (NISQ) devices.
JAMA Psychiatry · 2022 · 55 citations
- Psychology
- Psychiatry
- Clinical psychology
Importance: Suicide is a leading cause of death; however, the molecular genetic basis of suicidal thoughts and behaviors (SITB) remains unknown. Objective: To identify novel, replicable genomic risk loci for SITB. Design, Setting, and Participants: This genome-wide association study included 633 778 US military veterans with and without SITB, as identified through electronic health records. GWAS was performed separately by ancestry, controlling for sex, age, and genetic substructure. Cross-ancestry risk loci were identified through meta-analysis. Study enrollment began in 2011 and is ongoing. Data were analyzed from November 2021 to August 2022. Main Outcome and Measures: SITB. Results: A total of 633 778 US military veterans were included in the analysis (57 152 [9%] female; 121 118 [19.1%] African ancestry, 8285 [1.3%] Asian ancestry, 452 767 [71.4%] European ancestry, and 51 608 [8.1%] Hispanic ancestry), including 121 211 individuals with SITB (19.1%). Meta-analysis identified more than 200 GWS (P < 5 × 10-8) cross-ancestry risk single-nucleotide variants for SITB concentrated in 7 regions on chromosomes 2, 6, 9, 11, 14, 16, and 18. Top single-nucleotide variants were largely intronic in nature; 5 were independently replicated in ISGC, including rs6557168 in ESR1, rs12808482 in DRD2, rs77641763 in EXD3, rs10671545 in DCC, and rs36006172 in TRAF3. Associations for FBXL19 and AC018880.2 were not replicated. Gene-based analyses implicated 24 additional GWS cross-ancestry risk genes, including FURIN, TSNARE1, and the NCAM1-TTC12-ANKK1-DRD2 gene cluster. Cross-ancestry enrichment analyses revealed significant enrichment for expression in brain and pituitary tissue, synapse and ubiquitination processes, amphetamine addiction, parathyroid hormone synthesis, axon guidance, and dopaminergic pathways. Seven other unique European ancestry-specific GWS loci were identified, 2 of which (POM121L2 and METTL15/LINC02758) were replicated. Two additional GWS ancestry-specific loci were identified within the African ancestry (PET112/GATB) and Hispanic ancestry (intergenic locus on chromosome 4) subsets, both of which were replicated. No GWS loci were identified within the Asian ancestry subset; however, significant enrichment was observed for axon guidance, cyclic adenosine monophosphate signaling, focal adhesion, glutamatergic synapse, and oxytocin signaling pathways across all ancestries. Within the European ancestry subset, genetic correlations (r > 0.75) were observed between the SITB phenotype and a suicide attempt-only phenotype, depression, and posttraumatic stress disorder. Additionally, polygenic risk score analyses revealed that the Million Veteran Program polygenic risk score had nominally significant main effects in 2 independent samples of veterans of European and African ancestry. Conclusions and Relevance: The findings of this analysis may advance understanding of the molecular genetic basis of SITB and provide evidence for ESR1, DRD2, TRAF3, and DCC as cross-ancestry candidate risk genes. More work is needed to replicate these findings and to determine if and how these genes might impact clinical care.
Solar Energy · 2021 · 39 citations
- Physics
- Meteorology
- Statistical physics
Supercomputer-Based Ensemble Docking Drug Discovery Pipeline with Application to Covid-19
Journal of Chemical Information and Modeling · 2020 · 188 citations
- Computer Science
- Artificial Intelligence
- Computer Science
We present a supercomputer-driven pipeline for in silico drug discovery using enhanced sampling molecular dynamics (MD) and ensemble docking. Ensemble docking makes use of MD results by docking compound databases into representative protein binding-site conformations, thus taking into account the dynamic properties of the binding sites. We also describe preliminary results obtained for 24 systems involving eight proteins of the proteome of SARS-CoV-2. The MD involves temperature replica exchange enhanced sampling, making use of massively parallel supercomputing to quickly sample the configurational space of protein drug targets. Using the Summit supercomputer at the Oak Ridge National Laboratory, more than 1 ms of enhanced sampling MD can be generated per day. We have ensemble docked repurposing databases to 10 configurations of each of the 24 SARS-CoV-2 systems using AutoDock Vina. Comparison to experiment demonstrates remarkably high hit rates for the top scoring tranches of compounds identified by our ensemble approach. We also demonstrate that, using Autodock-GPU on Summit, it is possible to perform exhaustive docking of one billion compounds in under 24 h. Finally, we discuss preliminary results and planned improvements to the pipeline, including the use of quantum mechanical (QM), machine learning, and artificial intelligence (AI) methods to cluster MD trajectories and rescore docking poses.
npj Digital Medicine · 2020 · 142 citations
- Machine Learning
- Artificial Intelligence
- Computer Science
= 2026). We trained and validated random forest, support vector machine and logistic regression to predict incident AD in 1, 2, 3, and 4 subsequent years. For predicting future incidence of AD in balanced samples (bootstrapping), the machine learning models showed reasonable performance in 1-year prediction with AUC of 0.775 and 0.759, based on "definite AD" and "probable AD" outcomes, respectively; in 2-year, 0.730 and 0.693; in 3-year, 0.677 and 0.644; in 4-year, 0.725 and 0.683. The results were similar when the entire (unbalanced) samples were used. Important clinical features selected in logistic regression included hemoglobin level, age and urine protein level. This study may shed a light on the utility of the data-driven machine learning model based on large-scale administrative health data in AD risk prediction, which may enable better selection of individuals at risk for AD in clinical trials or early detection in clinical settings.
Frequent coauthors
- 104 shared
Jiook Cha
- 96 shared
Chenxiao Xu
- 95 shared
Hyoung Seop Kim
National Health Insurance Service
- 89 shared
Yaakov Stern
Columbia University Irving Medical Center
- 87 shared
H. Eric Tseng
- 86 shared
Ji Hwan Park
- 86 shared
Yun Wang
Beijing Anding Hospital
- 81 shared
Benedetta Bigio
New York University
Education
- 2009
Ph.D., Computer Science
University of California, Los Angeles
- 2006
M.S., Computer Science
University of California, Los Angeles
- 2004
B.S., Computer Science
University of California, Los Angeles
Similar researchers at Stony Brook University
- Resume-aware match score
- Save to shortlist
- AI-drafted outreach
See your match with Shinjae Yoo
PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.
- Free to start
- No credit card
- 30-second signup