Resume-aware faculty matching

Find professors who actually fit you

Upload your resume. Four AI agents analyze your background, rank the faculty who fit, inspect their recent research, and help you draft outreach — grounded in their actual work, not templates.

Free to startNo credit cardCancel anytime
Top matches Balanced preset
Dr. Sarah Chen
Stanford · Interpretability · NLP
91
Dr. Marcus Holloway
MIT · Robotics · RL
84
Dr. Aisha Okonkwo
CMU · Fairness · HCI
82
Nova · Professor Researcher · re-ranking top 20…
David Hunter

David Hunter

· Professor of Statistics, Graduate Faculty, Social Data Analytics, C-SoDA Faculty AffiliateVerified

Pennsylvania State University · Social Data Analytics

Active 1940–2026

h-index43
Citations12.8k
Papers15715 last 5y
Funding$1.4M
See your match with David Hunter — sign in to PhdFit.Sign in

About

David Hunter is a Professor of Statistics and a Graduate Faculty member at the Pennsylvania State University. He is also affiliated with the Social Data Analytics (C-SoDA) program and serves as a C-SoDA Faculty Affiliate. His office is located at 302 Pond Laboratory, University Park, PA 16802. Further information about his academic profile and work can be found on his website at http://sites.stat.psu.edu/~dhunter/.

Research topics

  • Data Mining
  • Computer Science
  • Machine Learning
  • Artificial Intelligence
  • Programming language
  • Mathematics
  • Theoretical computer science
  • Statistics
  • Data science

Selected publications

  • Taking Stock

    2026-03-07

    book-chapter1st authorCorresponding

    Abstract This chapter examines the promises and pitfalls of using statistics in discussions about inclusivity and bias. Through three illustrative vignettes—on college admissions modeling, blind orchestra auditions, and academic science hiring—the chapter explores the limits of statistical inference in socially complex settings. It highlights the importance of model interpretability in high-stakes decision-making, the dangers of conflating correlation with causation, and the challenges of concluding incomplete or unrepresentative data. The author argues that, while statistical tools can enhance clarity and foster an informed debate, they are often misunderstood or misapplied in ways that obscure rather than illuminate truth. The chapter advocates for a common-sense, ethically grounded approach to data interpretation, especially when the societal stakes are high. It concludes that statistical reasoning must remain transparent and humble, mindful of its limitations in measuring nuanced social realities.

  • A chromosome-level reference genome for the critically endangered Southern Corroboree frog (Pseudophryne corroboree)

    Wellcome Open Research · 2025-04-30 · 1 citations

    preprintOpen access

    <ns7:p> The Southern Corroboree frog ( <ns7:italic>Pseudophryne corroboree</ns7:italic> ; Anura; Myobatrachidae) is a Critically Endangered amphibian, according to the IUCN, and is endemic to the Snowy Mountains region of Kosciuszko National Park in New South Wales, Australia. This species has been driven to functional extinction by the introduction of the fungal disease, chytridiomycosis. Here we provide the first reference genome for <ns7:italic>P. corroboree</ns7:italic> . Using PacBio HiFi sequencing, Arima Hi-C, and Bionano optical mapping, we produced a chromosome-level genome assembly. Additionally, we generated a reference transcriptome based on multiple tissues from both male and female individuals to support genome annotation. The resulting genome spans 8.87 Gb across 12 chromosomes, with a contig N50 of 6.8 Mb. This research provides a phased, annotated genome assembly along with transcriptomic resources to facilitate future conservation genomic studies of <ns7:italic>P. corroboree</ns7:italic> . Furthermore, the genome offers an invaluable resource for taxonomic and evolutionary research, particularly given the nearest available chromosome-level reference genome is from <ns7:italic>Mixophyes fleayi</ns7:italic> , a species that last shared a common ancestor with <ns7:italic>P. corroboree</ns7:italic> 80 million years ago. </ns7:p>

  • A Regression Framework for Studying Relationships among Attributes under Network Interference

    Journal of the American Statistical Association · 2025-10-01

    articleSenior author
  • A regression framework for studying relationships among attributes under network interference

    arXiv (Cornell University) · 2024-10-10

    preprintOpen accessSenior author

    To understand how the interconnected and interdependent world of the twenty-first century operates and make model-based predictions, joint probability models for networks and interdependent outcomes are needed. We propose a comprehensive regression framework for networks and interdependent outcomes with multiple advantages, including interpretability, scalability, and provable theoretical guarantees. The regression framework can be used for studying relationships among attributes of connected units and captures complex dependencies among connections and attributes, while retaining the virtues of linear regression, logistic regression, and other regression models by being interpretable and widely applicable. On the computational side, we show that the regression framework is amenable to scalable statistical computing based on convex optimization of pseudo-likelihoods using minorization-maximization methods. On the theoretical side, we establish convergence rates for pseudo-likelihood estimators based on a single observation of dependent connections and attributes. We demonstrate the regression framework using simulations and an application to hate speech on the social media platform X.

  • Modeling Homophily in Exponential-Family Random Graph Models for Bipartite Networks

    arXiv (Cornell University) · 2023-12-09

    preprintOpen access

    Homophily, the tendency of individuals who are alike to form ties with one another, is an important concept in the study of social networks. Yet accounting for homophily effects is complicated in the context of bipartite networks where ties connect individuals not with one another but rather with a separate set of nodes, which might also be individuals but which are often an entirely different type of objects. As a result, much work on the effect of homophily in a bipartite network proceeds by first eliminating the bipartite structure, collapsing a two-mode network to a one-mode network and thereby ignoring potentially meaningful structure in the data. We introduce a set of methods to model homophily on bipartite networks without losing information in this way, then we demonstrate that these methods allow for substantively interesting findings in management science not possible using standard techniques. These methods are implemented in the widely-used ergm package for R.

  • A dynamic additive and multiplicative effects network model with application to the United Nations voting behaviors

    The Annals of Applied Statistics · 2023-10-31 · 3 citations

    articleOpen access

    Motivated by a study of United Nations voting behaviors, we introduce a regression model for a series of networks that are correlated over time. Our model is a dynamic extension of the additive and multiplicative effects network model (AMEN) of Hoff (2021). In addition to incorporating a temporal structure, the model accommodates two types of missing data thus allows the size of the network to vary over time. We demonstrate via simulations the necessity of various components of the model. We apply the model to the United Nations General Assembly voting data from 1983 to 2014 (Voeten, 2013) to answer interesting research questions regarding international voting behaviors. In addition to finding important factors that could explain the voting behaviors, the model-estimated additive effects, multiplicative effects, and their movements reveal meaningful foreign policy positions and alliances of various countries.

  • Computing Pseudolikelihood Estimators for Exponential-Family Random Graph Models

    Journal of Data Science · 2023-01-01 · 5 citations

    articleOpen accessSenior authorCorresponding

    The reputation of the maximum pseudolikelihood estimator (MPLE) for Exponential Random Graph Models (ERGM) has undergone a drastic change over the past 30 years. While first receiving broad support, mainly due to its computational feasibility and the lack of alternatives, general opinions started to change with the introduction of approximate maximum likelihood estimator (MLE) methods that became practicable due to increasing computing power and the introduction of MCMC methods. Previous comparison studies appear to yield contradicting results regarding the preference of these two point estimators; however, there is consensus that the prevailing method to obtain an MPLE’s standard error by the inverse Hessian matrix generally underestimates standard errors. We propose replacing the inverse Hessian matrix by an approximation of the Godambe matrix that results in confidence intervals with appropriate coverage rates and that, in addition, enables examining for model degeneracy. Our results also provide empirical evidence for the asymptotic normality of the MPLE under certain conditions.

  • <b>ergm</b> 4: New Features for Analyzing Exponential-Family Random Graph Models

    Journal of Statistical Software · 2023 · 61 citations

    • Computer Science
    • Computer Science
    • Data Mining

    The ergm package supports the statistical analysis and simulation of network data. It anchors the statnet suite of packages for network analysis in R introduced in a special issue in Journal of Statistical Software in 2008. This article provides an overview of the new functionality in the 2021 release of ergm version 4. These include more flexible handling of nodal covariates, term operators that extend and simplify model specification, new models for networks with valued edges, improved handling of constraints on the sample space of networks, and estimation with missing edge data. We also identify the new packages in the statnet suite that extend ergm's functionality to other network data types and structural features and the robust set of online resources that support the statnet development process and applications.

  • Likelihood-based inference for exponential-family random graph models via linear programming

    Electronic Journal of Statistics · 2023-01-01 · 3 citations

    articleOpen accessSenior author

    The problem of determining whether a given point, or set of points, lies within the convex hull of another set of points in d dimensions arises naturally in the context of certain exponential family models in statistics. This article discusses the general convex hull problem and its application to the particular problem of modelling network data using an exponential-family random graph model (ERGM). While the convex hull question may be solved via a simple linear program, this approach is not well known in the statistical literature. The article also details several substantial improvements to the convex hull-testing algorithm currently implemented in the widely used ergm package for network modeling. It provides direct numerical comparisons of two linear programming packages for R that can be called by ergm and offers several illustrative examples.

  • Improving ERGM starting values using simulated annealing

    Social Networks · 2023-11-07 · 4 citations

    articleOpen accessSenior author

    Much of the theory of estimation for exponential family models, which include exponential-family random graph models (ERGMs) as a special case, is well-established and maximum likelihood estimates (MLEs) in particular enjoy many desirable properties. However, in the case of many ERGMs, direct calculation of MLEs is impossible and therefore methods for approximating MLEs and/or alternative estimation methods must be employed. Many MLE approximation algorithms require an alternative estimate as a starting point. The maximum pseudo-likelihood estimator (MPLE) is frequently taken as this starting point. Here, we discuss a potentially large class of such alternatives based on the fact that, unlike the MLE, the MPLE fails to satisfy the so-called “likelihood principle”. This means that different networks may have different MPLEs even if they have the same sufficient statistics. We exploit this fact here to search for improved starting values for approximation-based MLE methods. The method we propose has shown its merit in producing an MLE for a network dataset and model that had defied estimation using all other known methods.

Recent grants

Frequent coauthors

  • Resume-aware match score
  • Save to shortlist
  • AI-drafted outreach

See your match with David Hunter

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

  • Free to start
  • No credit card
  • 30-second signup