Resume-aware faculty matching

Find professors who actually fit you

Upload your resume. Four AI agents analyze your background, rank the faculty who fit, inspect their recent research, and help you draft outreach — grounded in their actual work, not templates.

Free to startNo credit cardCancel anytime
Top matches Balanced preset
Dr. Sarah Chen
Stanford · Interpretability · NLP
91
Dr. Marcus Holloway
MIT · Robotics · RL
84
Dr. Aisha Okonkwo
CMU · Fairness · HCI
82
Nova · Professor Researcher · re-ranking top 20…
Jianqing Fan

Jianqing Fan

· Associated FacultyVerified

Princeton University · Computer Science

Active 1986–2026

h-index119
Citations62.0k
Papers782199 last 5y
Funding$11.7M2 active
See your match with Jianqing Fan — sign in to PhdFit.Sign in

About

Jianqing Fan is a statistician, financial econometrician, data scientist, and AI researcher. He holds the position of Frederick L. Moore '18 Professor of Finance, Professor of Statistics, and Professor of Operations Research and Financial Engineering at Princeton University. He has served as the chair of the department from 2012 to 2015. His research focuses on statistics, finance, machine learning, and computational biology. Fan has received numerous awards including the 2000 COPSS Presidents' Award, the Morningside Gold Medal for Applied Mathematics in 2007, and the Guy Medal in Silver in 2014. He was elected as an Academician from Academia Sinica in 2012, a member of the Royal Flemish Academy of Belgium in 2023, and a member of the National Academy of Science in 2026. He is associated with multiple departments and centers at Princeton, including the Department of Economics, Department of Computer Science, Department of Electrical Engineering, and various research centers related to statistics, finance, and energy.

Research topics

  • Computer Science
  • Artificial Intelligence
  • Statistics
  • Machine Learning
  • Mathematics
  • Combinatorics
  • Applied mathematics
  • Discrete mathematics
  • Algorithm
  • Data science

Selected publications

  • Inferences on mixing probabilities and ranking in mixed-membership models

    Journal of the American Statistical Association · 2026-05-18 · 3 citations

    preprintOpen access

    Network data is prevalent in numerous big data applications, including economics and health networks, where understanding the latent structure of the network is of prime importance. In this paper, we model the network using the Degree-Corrected Mixed Membership (DCMM) model. In the DCMM model, for each node <i>i</i>, there exists a membership vector πi=(πi(1),πi(2),…,πi(K)), where πi(k) denotes the weight that node <i>i</i> puts in community <i>k</i>. We derive a novel finite-sample expansion for the πi(k) s, which allows us to obtain asymptotic distributions and confidence intervals of the membership mixing probabilities and other related population quantities. This fills an important gap in uncertainty quantification on the member’s profile. We further develop a ranking scheme of the vertices based on the membership mixing probabilities on certain communities and perform relevant statistical inferences. A multiplier bootstrap method is proposed for ranking inference of individual membership profiles with respect to a given community. The validity of our theoretical results is further demonstrated via numerical experiments in both real and synthetic data examples.

  • Unearthing Financial Statement Fraud: Insights from News Coverage Analysis

    Management Science · 2025-09-05 · 1 citations

    article1st authorCorresponding

    We propose a financial statement (FS) fraud detection framework, called PeerMeta, that makes improvements in all three components of the detection procedure: label measurement, feature set, and detection model. For the label measurement, prior studies mainly adopt FS fraud events that have already been disclosed and confirmed. We construct a new measure based on news coverage that can reflect unrevealed FS fraud behaviors as well. For the feature set, we innovatively add peer factors learned through the business description texts in financial reports. For the detection model, two meta-learning algorithms are applied to aggregate the 19 popular classifiers. The results indicate that the proposed method has amazingly high recall of real fraud cases announced by regulatory authorities, reaching a staggering value of 0.982. We document that all components in PeerMeta contribute to the improvements of FS fraud detection and also showcase the significant economic value of the detection framework and find that recall is more crucial for the economic value than precision. This paper was accepted by Agostino Capponi, finance. Funding: This work was supported by the National Natural Science Foundation of China [Grants 71991470, 7199471, 72121002, 72310107002] and the National Key R&amp;D Program of China [Grant 2021YFC3340703]. Supplemental Material: The online appendix and data files are available at https://doi.org/10.1287/mnsc.2023.03604 .

  • Asymptotic Theory of Eigenvectors for Latent Embeddings with Generalized Laplacian Matrices

    ArXiv.org · 2025-03-01

    preprintOpen access1st authorCorresponding

    Laplacian matrices are commonly employed in many real applications, encoding the underlying latent structural information such as graphs and manifolds. The use of the normalization terms naturally gives rise to random matrices with dependency. It is well-known that dependency is a major bottleneck of new random matrix theory (RMT) developments. To this end, in this paper, we formally introduce a class of generalized (and regularized) Laplacian matrices, which contains the Laplacian matrix and the random adjacency matrix as a specific case, and suggest the new framework of the asymptotic theory of eigenvectors for latent embeddings with generalized Laplacian matrices (ATE-GL). Our new theory is empowered by the tool of generalized quadratic vector equation for dealing with RMT under dependency, and delicate high-order asymptotic expansions of the empirical spiked eigenvectors and eigenvalues based on local laws. The asymptotic normalities established for both spiked eigenvectors and eigenvalues will enable us to conduct precise inference and uncertainty quantification for applications involving the generalized Laplacian matrices with flexibility. We discuss some applications of the suggested ATE-GL framework and showcase its validity through some numerical examples.

  • How to Find Fantastic AI Papers: Self-Rankings as a Powerful Predictor of Scientific Impact Beyond Peer Review

    ArXiv.org · 2025-10-02

    preprintOpen access

    Peer review in academic research aims not only to ensure factual correctness but also to identify work of high scientific potential that can shape future research directions. This task is especially critical in fast-moving fields such as artificial intelligence (AI), yet it has become increasingly difficult given the rapid growth of submissions. In this paper, we investigate an underexplored measure for identifying high-impact research: authors' own rankings of their multiple submissions to the same AI conference. Grounded in game-theoretic reasoning, we hypothesize that self-rankings are informative because authors possess unique understanding of their work's conceptual depth and long-term promise. To test this hypothesis, we conducted a large-scale experiment at a leading AI conference, where 1,342 researchers self-ranked their 2,592 submissions by perceived quality. Tracking outcomes over more than a year, we found that papers ranked highest by their authors received twice as many citations as their lowest-ranked counterparts; self-rankings were especially effective at identifying highly cited papers (those with over 150 citations). Moreover, we showed that self-rankings outperformed peer review scores in predicting future citation counts. Our results remained robust after accounting for confounders such as preprint posting time and self-citations. Together, these findings demonstrate that authors' self-rankings provide a reliable and valuable complement to peer review for identifying and elevating high-impact research in AI.

  • Surface Atomic Defects and Self-Regulated CO Adsorption on Cu(111): Insights from High-Resolution Scanning Probe Microscopy

    Microscopy and Microanalysis · 2025-07-01 · 1 citations

    article
  • Fundamental Computational Limits in Pursuing Invariant Causal Prediction and Invariance-Guided Regularization

    ArXiv.org · 2025-01-29

    preprintOpen accessSenior author

    Pursuing invariant prediction from heterogeneous environments opens the door to learning causality in a purely data-driven way and has several applications in causal discovery and robust transfer learning. However, existing methods such as ICP [Peters et al., 2016] and EILLS [Fan et al., 2024] that can attain sample-efficient estimation are based on exponential time algorithms. In this paper, we show that such a problem is intrinsically hard in computation: the decision problem, testing whether a non-trivial prediction-invariant solution exists across two environments, is NP-hard even for the linear causal relationship. In the world where P$\neq$NP, our results imply that the estimation error rate can be arbitrarily slow using any computationally efficient algorithm. This suggests that pursuing causality is fundamentally harder than detecting associations when no prior assumption is pre-offered. Given there is almost no hope of computational improvement under the worst case, this paper proposes a method capable of attaining both computationally and statistically efficient estimation under additional conditions. Furthermore, our estimator is a distributionally robust estimator with an ellipse-shaped uncertain set where more uncertainty is placed on spurious directions than invariant directions, resulting in a smooth interpolation between the most predictive solution and the causal solution by varying the invariance hyper-parameter. Non-asymptotic results and empirical applications support the claim.

  • Uncertainty Quantification for Ranking with Heterogeneous Preferences

    ArXiv.org · 2025-09-02

    preprintOpen access1st authorCorresponding

    This paper studies human preference learning based on partially revealed choice behavior and formulates the problem as a generalized Bradley-Terry-Luce (BTL) ranking model that accounts for heterogeneous preferences. Specifically, we assume that each user is associated with a nonparametric preference function, and each item is characterized by a low-dimensional latent feature vector - their interaction defines the underlying low-rank score matrix. In this formulation, we propose an indirect regularization method for collaboratively learning the score matrix, which ensures entrywise $\ell_\infty$-norm error control - a novel contribution to the heterogeneous preference learning literature. This technique is based on sieve approximation and can be extended to a broader class of binary choice models where a smooth link function is adopted. In addition, by applying a single step of the Newton-Raphson method, we debias the regularized estimator and establish uncertainty quantification for item scores and rankings of items, both for the aggregated and individual preferences. Extensive simulation results from synthetic and real datasets corroborate our theoretical findings.

  • Communication-Efficient Distributed Estimation and Inference for Cox’s Model

    Journal of the American Statistical Association · 2025-06-18 · 2 citations

    article
  • The ICML 2023 Ranking Experiment: Examining Author Self-Assessment in ML/AI Peer Review

    Journal of the American Statistical Association · 2025-06-02 · 2 citations

    article
  • Transformers versus the EM Algorithm in Multi-class Clustering

    ArXiv.org · 2025-02-09

    preprintOpen access

    LLMs demonstrate significant inference capacities in complicated machine learning tasks, using the Transformer model as its backbone. Motivated by the limited understanding of such models on the unsupervised learning problems, we study the learning guarantees of Transformers in performing multi-class clustering of the Gaussian Mixture Models. We develop a theory drawing strong connections between the Softmax Attention layers and the workflow of the EM algorithm on clustering the mixture of Gaussians. Our theory provides approximation bounds for the Expectation and Maximization steps by proving the universal approximation abilities of multivariate mappings by Softmax functions. In addition to the approximation guarantees, we also show that with a sufficient number of pre-training samples and an initialization, Transformers can achieve the minimax optimal rate for the problem considered. Our extensive simulations empirically verified our theory by revealing the strong learning capacities of Transformers even beyond the assumptions in the theory, shedding light on the powerful inference capacities of LLMs.

Recent grants

Frequent coauthors

  • Yi Ren

    Guangxi Medical University

    131 shared
  • Kai Cao

    University Radiology

    114 shared
  • Wise Young

    Rutgers, The State University of New Jersey

    114 shared
  • Lin Leng

    Yale University

    110 shared
  • Richard Bucala

    Yale University

    110 shared
  • Iman Tadmori

    110 shared
  • Andreas Meinhardt

    Hudson Institute of Medical Research

    110 shared
  • Changshun Shao

    Changchun University of Science and Technology

    110 shared

Education

  • Ph. D., Department of Statistics

    University of California, Berkeley

    1989
  • Masters, Department of Statistics

    Institute of Applied Mathematics, Chinese Academy of Science

    1985
  • Bachelor, Department of Mathematics

    Fudan University

    1982

Awards & honors

  • 2000 COPSS Presidents' Award
  • Morningside Gold Medal for Applied Mathematics (2007)
  • Guggenheim Fellow (2009)
  • Pao-Lu Hsu Prize (2013)
  • Guy Medal in Silver (2014)
  • Resume-aware match score
  • Save to shortlist
  • AI-drafted outreach

See your match with Jianqing Fan

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

  • Free to start
  • No credit card
  • 30-second signup