
Cory McCartan
· Assistant Professor of StatisticsVerifiedNew York University · Center for Data Science
Active 2020–2025
About
Cory McCartan is an Assistant Professor of Statistics at Penn State. His research focuses on the development and application of statistical methods, particularly in the context of data science and AI. He is part of a community of faculty fellows at the NYU Center for Data Science, where alumni have gone on to hold faculty positions at prestigious institutions such as Yale, the University of Chicago, Johns Hopkins, and École Polytechnique, as well as roles in industry and government agencies like OpenAI, Meta, and the FDA. The program emphasizes original research, interdisciplinary collaboration, and contributions to the future of AI and data science.
Research topics
- Political Science
- Computer Science
- Law
- Data Mining
- Sociology
- Artificial Intelligence
- Economics
- Mathematics
- Demography
- Algorithm
- Engineering
- Geography
- Political economy
- Actuarial science
- Biology
- Statistics
Selected publications
Relative Bias Under Imperfect Identification in Observational Causal Inference
ArXiv.org · 2025-07-31
preprintOpen accessSenior authorTo conduct causal inference in observational settings, researchers must rely on certain identifying assumptions. In practice, these assumptions are unlikely to hold exactly. This paper considers the bias of selection-on-observables, instrumental variables, and proximal inference estimates under violations of their identifying assumptions. We develop bias expressions for IV and proximal inference that show how violations of their respective assumptions are amplified by any unmeasured confounding in the outcome variable. We propose a set of sensitivity tools that quantify the sensitivity of different identification strategies, and an augmented bias contour plot visualizes the relationship between these strategies. We argue that the act of choosing an identification strategy implicitly expresses a belief about the degree of violations that must be present in alternative identification strategies. Even when researchers intend to conduct an IV or proximal analysis, a sensitivity analysis comparing different identification strategies can help to better understand the implications of each set of assumptions. Throughout, we compare the different approaches on a re-analysis of the impact of state surveillance on the incidence of protest in Communist Poland.
Individual and Differential Harm in Redistricting
2025-08-26
preprintOpen access1st authorCorrespondingSocial scientists have developed dozens of measures for assessing partisan bias in redistricting. But these measures are not easily adapted to other groups, including groups defined by race, class, or geography. Nor are they applicable to single- or no-party contexts, such as local redistricting. To overcome these limitations, we propose a unified framework of harm for evaluating the impacts of a districting plan on individual voters and the groups to which they belong. We consider a voter harmed if their chosen candidate is not elected under the current plan, but would be under a different plan. Harm improves on existing measures by both focusing on the choices of individual voters and directly incorporating counterfactual plans. We discuss strategies for estimating harm, and demonstrate the utility of our framework through analyses of partisan gerrymandering in New Jersey, voting rights litigation in Alabama, and racial dynamics of Boston City Council elections.
bases: Basis Expansions for Regression Modeling
2025-05-29
datasetOpen access1st authorCorrespondingProvides various basis expansions for flexible regression modeling, including random Fourier features (Rahimi & Recht, 2007) <<a href="https://proceedings.neurips.cc/paper_files/paper/2007/file/013a006f03dbc5392effeb8f18fda755-Paper.pdf" target="_top">https://proceedings.neurips.cc/paper_files/paper/2007/file/013a006f03dbc5392effeb8f18fda755-Paper.pdf</a>>, exact kernel / Gaussian process feature maps, Bayesian Additive Regression Trees (BART) (Chipman et al., 2010) <<a href="https://doi.org/10.1214%2F09-AOAS285" target="_top">doi:10.1214/09-AOAS285</a>> prior features, and a helpful interface for n-way interactions. The provided functions may be used within any modeling formula, allowing the use of kernel methods and other basis expansions in modeling functions that do not otherwise support them. Along with the basis expansions, a number of kernel functions are also provided, which support kernel arithmetic to form new kernels. Basic ridge regression functionality is included as well.
Estimating Racial Disparities When Race is Not Observed
Journal of the American Statistical Association · 2025-07-15 · 3 citations
article1st authorredistverse: Easily Install and Load Redistricting Software
2024-06-18
datasetOpen accessSenior authorEasy installation, loading, and control of packages for redistricting data downloading, spatial data processing, simulation, analysis, and visualization. This package makes it easy to install and load multiple 'redistverse' packages at once. The 'redistverse' is developed and maintained by the Algorithm-Assisted Redistricting Methodology (ALARM) Project. For more details see <<a href="https://alarm-redist.org" target="_top">https://alarm-redist.org</a>>.
Redistricting Reforms Reduce Gerrymandering by Constraining Partisan Actors
arXiv (Cornell University) · 2024-07-16
preprintOpen access1st authorCorrespondingPolitical actors often manipulate redistricting plans to gain electoral advantages, a process known as gerrymandering. Several states have implemented institutional reforms to address this problem, such as establishing map-drawing commissions. Estimating the impact of such reforms is challenging because each state structures its processes and rules differently. We model redistricting as a sequential game whose equilibrium solution summarizes multi-step institutional interactions as a univariate score. We argue this score measures the leeway political actors have over the partisan lean of the final plan. Using a differences-in-differences design, we demonstrate that reforms reduce partisan bias and increase competitiveness when they constrain partisan actors. We perform a counterfactual policy analysis to estimate the effects of enacting recent reforms nationwide. Though commissions generally reduce bias, reforms that restrict partisan actors in multiple ways, like removing veto points (Michigan), are more effective than commissions where parties retain some control (Ohio).
Evaluating bias and noise induced by the U.S. Census Bureau’s privacy protection methods
Science Advances · 2024-05-01 · 17 citations
articleOpen accessCorrespondingThe U.S. Census Bureau faces a difficult trade-off between the accuracy of Census statistics and the protection of individual information. We conduct an independent evaluation of bias and noise induced by the Bureau's two main disclosure avoidance systems: the TopDown algorithm used for the 2020 Census and the swapping algorithm implemented for the three previous Censuses. Our evaluation leverages the Noisy Measurement File (NMF) as well as two independent runs of the TopDown algorithm applied to the 2010 decennial Census. We find that the NMF contains too much noise to be directly useful without measurement error modeling, especially for Hispanic and multiracial populations. TopDown's postprocessing reduces the NMF noise and produces data whose accuracy is similar to that of swapping. While the estimated errors for both TopDown and swapping algorithms are generally no greater than other sources of Census error, they can be relatively substantial for geographies with small total populations.
Estimating Racial Disparities When Race is Not Observed
National Bureau of Economic Research · 2024-04-01 · 8 citations
reportOpen access1st authorCorrespondingThe estimation of racial disparities in various fields is often hampered by the lack of individuallevel racial information.In many cases, the law prohibits the collection of such information to prevent direct racial discrimination.As a result, analysts have frequently adopted Bayesian Improved Surname Geocoding (BISG) and its variants, which combine individual names and addresses with Census data to predict race.Unfortunately, the residuals of BISG are often correlated with the outcomes of interest, generally attenuating estimates of racial disparities.To correct this bias, we propose an alternative identification strategy under the assumption that surname is conditionally independent of the outcome given (unobserved) race, residence location, and other observed characteristics.We introduce a new class of models, Bayesian Instrumental Regression for Disparity Estimation (BIRDiE), that take BISG probabilities as inputs and produce racial disparity estimates by using surnames as an instrumental variable for race.Our estimation method is scalable, making it possible to analyze large-scale administrative data.We also show how to address potential violations of the key identification assumptions.A validation study based on the North Carolina voter file shows that BISG+BIRDiE reduces error by up to 84% when estimating racial differences in party registration.Finally, we apply the proposed methodology to estimate racial differences in who benefits from the home mortgage interest deduction using individual-level tax data from the U.S.
Census officials must constructively engage with independent evaluations
Proceedings of the National Academy of Sciences · 2024-03-05 · 3 citations
letterOpen accessDue to its small size and lifelong optical transparency, the fish Danionella cerebrum is an emerging model organism in biomedical research. How can this small vertebrate under 12 mm length produce sounds over 140 dB? We found that it possesses ...Motion is the basis of nearly all animal behavior. Evolution has led to some extraordinary specializations of propulsion mechanisms among invertebrates, including the mandibles of the dracula ant and the claw of the pistol shrimp. In contrast, vertebrate ...
Estimating Racial Disparities When Race is Not Observed
SSRN Electronic Journal · 2024-01-01 · 2 citations
articleOpen access1st authorCorresponding
Frequent coauthors
- 40 shared
Kosuke Imai
Harvard University
- 27 shared
Christopher T Kenny
- 19 shared
Tyler Simko
Princeton University
- 14 shared
Shiro Kuriwaki
Yale University
- 10 shared
Jacob R. Brown
- 8 shared
Evan Rosenman
Harvard University
- 5 shared
Ben Fifield
- 4 shared
Melissa M. Wu
Duke University
Awards & honors
- CDS Faculty Fellows and Moore-Sloan Fellows at CDS
- Alumni of these programs now hold faculty positions at Yale,…
- Vincent Divol: AI Fellow, Universite PSL
- Berfin Simsek: FRF (Postdoctoral Fellow), Flatiron CCM
- Resume-aware match score
- Save to shortlist
- AI-drafted outreach
See your match with Cory McCartan
PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.
- Free to start
- No credit card
- 30-second signup