
Xihong Lin
· Statistics Department Chair, Professor of Statistics, Professor of Biostatistics: Harvard T.H. Chan School of Public Health, National Academy of Sciences Member, National Academy of Medicine MemberHarvard University · Biostatistics
Active 1992–2024
About
Xihong Lin is a Professor of Statistics and the Department Chair at Harvard University, with additional appointment as a Professor of Biostatistics at the Harvard T.H. Chan School of Public Health. She is a member of the National Academy of Sciences and the National Academy of Medicine. Her research interests include scalable statistical inference for big data, statistical machine learning and artificial intelligence, integrative data analysis, causal inference and mediation analysis, statistical genetics and genomics, analysis of complex observational data, and statistical cloud computing. She is actively involved in advancing statistical methodologies and their applications in various scientific fields.
Research topics
- Genetics
- Biology
- Demography
- Medicine
- Computer Science
- Internal medicine
- Environmental health
- Gerontology
- Computational biology
- Evolutionary biology
- Geography
- Family medicine
- Endocrinology
- Virology
- Psychology
- Pathology
- Bioinformatics
- Cartography
- Surgery
Selected publications
Nature Genetics · 2022 · 354 citations
- Biology
- Genetics
Whole genome sequence analysis of blood lipid levels in >66,000 individuals
Nature Communications · 2022 · 73 citations
- Genetics
- Biology
- Computational biology
Blood lipids are heritable modifiable causal factors for coronary artery disease. Despite well-described monogenic and polygenic bases of dyslipidemia, limitations remain in discovery of lipid-associated alleles using whole genome sequencing (WGS), partly due to limited sample sizes, ancestral diversity, and interpretation of clinical significance. Among 66,329 ancestrally diverse (56% non-European) participants, we associate 428M variants from deep-coverage WGS with lipid levels; ~400M variants were not assessed in prior lipids genetic analyses. We find multiple lipid-related genes strongly associated with blood lipids through analysis of common and rare coding variants. We discover several associated rare non-coding variants, largely at Mendelian lipid genes. Notably, we observe rare LDLR intronic variants associated with markedly increased LDL-C, similar to rare LDLR exonic variants. In conclusion, we conducted a systematic whole genome scan for blood lipids expanding the alleles linked to lipids for multiple ancestries and characterize a clinically-relevant rare non-coding variant model for lipids.
Genome Medicine · 2021 · 34 citations
- Genetics
- Medicine
- Biology
BACKGROUND: Sleep-disordered breathing is a common disorder associated with significant morbidity. The genetic architecture of sleep-disordered breathing remains poorly understood. Through the NHLBI Trans-Omics for Precision Medicine (TOPMed) program, we performed the first whole-genome sequence analysis of sleep-disordered breathing. METHODS: The study sample was comprised of 7988 individuals of diverse ancestry. Common-variant and pathway analyses included an additional 13,257 individuals. We examined five complementary traits describing different aspects of sleep-disordered breathing: the apnea-hypopnea index, average oxyhemoglobin desaturation per event, average and minimum oxyhemoglobin saturation across the sleep episode, and the percentage of sleep with oxyhemoglobin saturation < 90%. We adjusted for age, sex, BMI, study, and family structure using MMSKAT and EMMAX mixed linear model approaches. Additional bioinformatics analyses were performed with MetaXcan, GIGSEA, and ReMap. RESULTS: ) on chromosome X with ARMCX3. Additional rare-variant associations include ARMCX3-AS1, MRPS33, and C16orf90. Novel common-variant loci were identified in the NRG1 and SLC45A2 regions, and previously associated loci in the IL18RAP and ATP2B4 regions were associated with novel phenotypes. Transcription factor binding site enrichment identified associations with genes implicated with respiratory and craniofacial traits. Additional analyses identified significantly associated pathways. CONCLUSIONS: We have identified the first gene-based rare-variant associations with objectively measured sleep-disordered breathing traits. Our results increase the understanding of the genetic architecture of sleep-disordered breathing and highlight associations in genes that modulate lung development, inflammation, respiratory rhythmogenesis, and HIF1A-mediated hypoxic response.
Sequencing of 53,831 diverse genomes from the NHLBI TOPMed Program
Nature · 2021 · 2261 citations
- Computer Science
- Biology
- Genetics
. In the first 53,831 TOPMed samples, we detected more than 400 million single-nucleotide and insertion or deletion variants after alignment with the reference genome. Additional previously undescribed variants were detected through assembly of unmapped reads and customized analysis in highly variable loci. Among the more than 400 million detected variants, 97% have frequencies of less than 1% and 46% are singletons that are present in only one individual (53% among unrelated individuals). These rare variants provide insights into mutational processes and recent human evolutionary history. The extensive catalogue of genetic variation in TOPMed studies provides unique opportunities for exploring the contributions of rare and noncoding sequence variants to phenotypic variation. Furthermore, combining TOPMed haplotypes with modern imputation methods improves the power and reach of genome-wide association studies to include variants down to a frequency of approximately 0.01%.
Population-scale longitudinal mapping of COVID-19 symptoms, behaviour and testing
Nature Human Behaviour · 2020 · 129 citations
Senior authorCorresponding- Computer Science
- Medicine
- Demography
Proceedings of the National Academy of Sciences · 2020 · 110 citations
- Genetics
- Biology
- Evolutionary biology
), which suggest that variation in DNM rate is significantly shaped by nonadditive genetic effects and the environment.
Inherited causes of clonal haematopoiesis in 97,691 whole genomes
Nature · 2020 · 726 citations
- Biology
- Genetics
The Lancet Public Health · 2020 · 2291 citations
- Medicine
- Demography
- Family medicine
BACKGROUND: Data for front-line health-care workers and risk of COVID-19 are limited. We sought to assess risk of COVID-19 among front-line health-care workers compared with the general community and the effect of personal protective equipment (PPE) on risk. METHODS: We did a prospective, observational cohort study in the UK and the USA of the general community, including front-line health-care workers, using self-reported data from the COVID Symptom Study smartphone application (app) from March 24 (UK) and March 29 (USA) to April 23, 2020. Participants were voluntary users of the app and at first use provided information on demographic factors (including age, sex, race or ethnic background, height and weight, and occupation) and medical history, and subsequently reported any COVID-19 symptoms. We used Cox proportional hazards modelling to estimate multivariate-adjusted hazard ratios (HRs) of our primary outcome, which was a positive COVID-19 test. The COVID Symptom Study app is registered with ClinicalTrials.gov, NCT04331509. FINDINGS: Among 2 035 395 community individuals and 99 795 front-line health-care workers, we recorded 5545 incident reports of a positive COVID-19 test over 34 435 272 person-days. Compared with the general community, front-line health-care workers were at increased risk for reporting a positive COVID-19 test (adjusted HR 11·61, 95% CI 10·93-12·33). To account for differences in testing frequency between front-line health-care workers and the general community and possible selection bias, an inverse probability-weighted model was used to adjust for the likelihood of receiving a COVID-19 test (adjusted HR 3·40, 95% CI 3·37-3·43). Secondary and post-hoc analyses suggested adequacy of PPE, clinical setting, and ethnic background were also important factors. INTERPRETATION: In the UK and the USA, risk of reporting a positive test for COVID-19 was increased among front-line health-care workers. Health-care systems should ensure adequate availability of PPE and develop additional strategies to protect health-care workers from COVID-19, particularly those from Black, Asian, and minority ethnic backgrounds. Additional follow-up of these observational findings is needed. FUNDING: Zoe Global, Wellcome Trust, Engineering and Physical Sciences Research Council, National Institutes of Health Research, UK Research and Innovation, Alzheimer's Society, National Institutes of Health, National Institute for Occupational Safety and Health, and Massachusetts Consortium on Pathogen Readiness.
JAMA · 2020 · 1913 citations
- Medicine
- Environmental health
- Demography
IMPORTANCE: Coronavirus disease 2019 (COVID-19) has become a pandemic, and it is unknown whether a combination of public health interventions can improve control of the outbreak. OBJECTIVE: To evaluate the association of public health interventions with the epidemiological features of the COVID-19 outbreak in Wuhan by 5 periods according to key events and interventions. DESIGN, SETTING, AND PARTICIPANTS: In this cohort study, individual-level data on 32 583 laboratory-confirmed COVID-19 cases reported between December 8, 2019, and March 8, 2020, were extracted from the municipal Notifiable Disease Report System, including patients' age, sex, residential location, occupation, and severity classification. EXPOSURES: Nonpharmaceutical public health interventions including cordons sanitaire, traffic restriction, social distancing, home confinement, centralized quarantine, and universal symptom survey. MAIN OUTCOMES AND MEASURES: Rates of laboratory-confirmed COVID-19 infections (defined as the number of cases per day per million people), across age, sex, and geographic locations were calculated across 5 periods: December 8 to January 9 (no intervention), January 10 to 22 (massive human movement due to the Chinese New Year holiday), January 23 to February 1 (cordons sanitaire, traffic restriction and home quarantine), February 2 to 16 (centralized quarantine and treatment), and February 17 to March 8 (universal symptom survey). The effective reproduction number of SARS-CoV-2 (an indicator of secondary transmission) was also calculated over the periods. RESULTS: Among 32 583 laboratory-confirmed COVID-19 cases, the median patient age was 56.7 years (range, 0-103; interquartile range, 43.4-66.8) and 16 817 (51.6%) were women. The daily confirmed case rate peaked in the third period and declined afterward across geographic regions and sex and age groups, except for children and adolescents, whose rate of confirmed cases continued to increase. The daily confirmed case rate over the whole period in local health care workers (130.5 per million people [95% CI, 123.9-137.2]) was higher than that in the general population (41.5 per million people [95% CI, 41.0-41.9]). The proportion of severe and critical cases decreased from 53.1% to 10.3% over the 5 periods. The severity risk increased with age: compared with those aged 20 to 39 years (proportion of severe and critical cases, 12.1%), elderly people (≥80 years) had a higher risk of having severe or critical disease (proportion, 41.3%; risk ratio, 3.61 [95% CI, 3.31-3.95]) while younger people (<20 years) had a lower risk (proportion, 4.1%; risk ratio, 0.47 [95% CI, 0.31-0.70]). The effective reproduction number fluctuated above 3.0 before January 26, decreased to below 1.0 after February 6, and decreased further to less than 0.3 after March 1. CONCLUSIONS AND RELEVANCE: A series of multifaceted public health interventions was temporally associated with improved control of the COVID-19 outbreak in Wuhan, China. These findings may inform public health policy in other countries and regions.
Recent grants
NIH · $2.5M · 2016
Integrative Analysis of Lung Cancer Etiology and Risk
NIH · $37.0M · 2017–2029
NIH · $1.1M · 2007
NIH · $2.0M · 2022–2026
NIH · $262k · 2017
Frequent coauthors
- 864 shared
Loı̈c Le Marchand
Cancer Center of Hawaii
- 852 shared
Peter Kraft
- 850 shared
Mattias Johansson
Centre International de Recherche sur le Cancer
- 850 shared
Demetrius Albanes
- 849 shared
Stephen J. Chanock
- 845 shared
Christopher A. Haiman
- 845 shared
Meir J. Stampfer
The Technological College of Beer Sheva
- 845 shared
Victoria L. Stevens
Mayo Clinic in Arizona
Education
- 1994
Ph.D., Statistics
Harvard University
- 1991
M.S., Statistics
Harvard University
- 1987
B.S., Mathematics
University of Science and Technology of China
Awards & honors
- Member, National Academy of Sciences
- Member, National Academy of Medicine
Similar researchers at Harvard University
- Resume-aware match score
- Save to shortlist
- AI-drafted outreach
See your match with Xihong Lin
PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.
- Free to start
- No credit card
- 30-second signup