Resume-aware faculty matching

Find professors who actually fit you

Upload your resume. Four AI agents analyze your background, rank the faculty who fit, inspect their recent research, and help you draft outreach — grounded in their actual work, not templates.

Free to startNo credit cardCancel anytime
Top matches Balanced preset
Dr. Sarah Chen
Stanford · Interpretability · NLP
91
Dr. Marcus Holloway
MIT · Robotics · RL
84
Dr. Aisha Okonkwo
CMU · Fairness · HCI
82
Nova · Professor Researcher · re-ranking top 20…
Jen Tang

Jen Tang

· ProfessorVerified

Purdue University · Quantitative Methods

Active 1987–2024

h-index18
Citations997
Papers401 last 5y
Funding
See your match with Jen Tang — sign in to PhdFit.Sign in

Research topics

  • Artificial Intelligence
  • Data Mining
  • Computer Science
  • Statistics
  • Algorithm
  • Mathematics

Selected publications

  • Clustering High-Dimensional Noisy Categorical Data

    Figshare · 2024-01-01

    datasetOpen accessSenior author

    Clustering is a widely used unsupervised learning technique that groups data into homogeneous clusters. However, when dealing with real-world data that contain categorical values, existing algorithms can be computationally costly in high dimensions and can struggle with noisy data that has missing values. Furthermore, except for one algorithm, no others provide theoretical guarantees of clustering accuracy. In this article, we propose a general categorical data encoding method and a computationally efficient spectral-based algorithm to cluster high-dimensional noisy categorical data (nominal or ordinal). Under a statistical model for data on <i>m</i> attributes from <i>n</i> subjects in <i>r</i> clusters with missing probability <i>ϵ</i>, we show that our algorithm exactly recovers the true clusters with high probability when mn(1−ϵ)≥CMr2 log 3M, with M=max(n,m) and a fixed constant <i>C</i>. In addition, we show that mn(1−ϵ)2≥rδ/2 with 0&lt;δ&lt;1 is necessary for <i>any</i> algorithm to succeed with probability at least (1+δ)/2. In cases where <i>m</i> = <i>n</i> and <i>r</i> are fixed, the sufficient condition matches with the necessary condition up to a polylog(n) factor. In numerical studies our algorithm outperforms several existing algorithms in both clustering accuracy and computational efficiency. Supplementary materials for this article are available online.

  • Clustering High-Dimensional Noisy Categorical Data

    Journal of the American Statistical Association · 2024 · 4 citations

    Senior authorCorresponding
    • Computer Science
    • Artificial Intelligence
    • Data Mining

    Clustering is a widely used unsupervised learning technique that groups data into homogeneous clusters. However, when dealing with real-world data that contain categorical values, existing algorithms can be computationally costly in high dimensions and can struggle with noisy data that has missing values. Furthermore, except for one algorithm, no others provide theoretical guarantees of clustering accuracy. In this article, we propose a general categorical data encoding method and a computationally efficient spectral-based algorithm to cluster high-dimensional noisy categorical data (nominal or ordinal). Under a statistical model for data on m attributes from n subjects in r clusters with missing probability ϵ, we show that our algorithm exactly recovers the true clusters with high probability when mn(1−ϵ)≥CMr2 log 3M, with M=max(n, m) and a fixed constant C. In addition, we show that mn(1−ϵ)2≥rδ/2 with 0

  • A Two-Stage Latent Variable Estimation Procedure for Time-Censored Accelerated Degradation Tests

    IEEE Transactions on Reliability · 2017-08-21 · 9 citations

    articleSenior author

    Parallel constant-stress accelerated degradation testing (PCSADT) is widely used to assess the reliability of highly reliable products in a timely manner when the products' degradation can be measured. Under a time-censored PCSADT, several groups of units are tested simultaneously, but under different stress levels, until a prespecified censoring time is reached. At this time, degradation values from the censored units, and failure times of the failed units are obtained. When the degradation follows a Wiener process where the parameters depend on the stress level through a life-stress model containing an unknown nuisance parameter, estimating this parameter often biases the maximum likelihood and least-squares estimators of the lifetime parameters. In this paper, we propose a two-stage procedure to address this problem. In the first stage, we transform the data under the different stress levels of a PCSADT so that the resulting data can be considered to have been obtained under normal stress. In the second stage, we introduce a latent variable for the unobserved degradation after the failure time for each failed unit to obtain a pseudodegradation value at the censoring time. We then use all degradation values (pseudo or observed) at the censoring time to develop latent variable estimators for all model parameters. Unlike other existing estimators, the proposed estimators are shown to be s-consistent, have closed-form expressions, and are easy to interpret. We use a real example of light-emitting diodes to illustrate the proposed method. In addition to proving s-consistencies, we conduct a simulation study to demonstrate that the proposed estimators also perform well in finite samples.

  • Equivalent step-stress accelerated life tests with log-location-scale lifetime distributions under Type-I censoring

    IIE Transactions · 2014-06-06 · 13 citations

    articleSenior author

    Accelerated Life Testing (ALT) is used to provide timely estimates of a product's lifetime distribution. Step-Stress ALT (SSALT) is one of the most widely adopted stress loadings and the optimum design of a SSALT plan has been extensively studied. However, few research efforts have been devoted to establishing the theoretical rationale for using SSALT in lieu of other types of stress loadings. This article proves the existence of statistically equivalent SSALT plans that can provide equally precise estimates to those derived from any continuous stress loading for the log-location-scale lifetime distributions with Type-I censoring. That is, for any optimization criterion based on the Fisher information matrix, SSALT is identical in comparison to other continuous stress loadings. The Weibull and lognormal distributions are introduced as special cases. For these two distributions, the relationship among statistical equivalencies is investigated and it is shown that two equivalent ALT plans must be equivalent in terms of the strongest version of equivalency for many objective functions. A numerical example for a ramp-stress ALT, using data from an existing study on miniature lamps, is used to illustrate equivalent SSALT plans. Results show that SSALT is not only equivalent to the existing ramp-stress test plans but also more cost-effective in terms of the total test cost.

  • Minimum cost allocation of quality improvement targets under supplier process disruption

    RePEc: Research Papers in Economics · 2014-01-01 · 3 citations

    articleSenior author

    This paper presents a system cost model to assist a manufacturer in assessing the minimum cost allocations of quality improvement targets to suppliers. The model accounts for the effects of autonomous learning and induced learning on quality improvement, via variance reductions of supplier processes. The model further accounts for the effects of planned and unplanned disruptions in supplier production processes, where such gaps in production decreases the amount of autonomous learning while providing an opportunity for induced learning, thereby counteracting the effect of disruptions on process improvement. An optimization model is developed that obtains the quality improvement allocations that minimize system expected cost to both suppliers and manufacturer. The proposed models also account for both the uncertainty in the realized induced learning rate as well as uncertainty in the realized level of process disruptions. An example is used to demonstrate an implementation of the proposed models and to assess the sensitivity of the optimal target allocations to several model parameters.

  • Optimum step-stress accelerated degradation test for Wiener degradation process under constraints

    European Journal of Operational Research · 2014-09-22 · 136 citations

    articleSenior author
  • Minimum cost allocation of quality improvement targets under supplier process disruption

    European Journal of Operational Research · 2013-02-09 · 24 citations

    articleSenior author
  • Statistical equivalency and optimality of simple step‐stress accelerated test plans for the exponential distribution

    Naval Research Logistics (NRL) · 2012-12-20 · 26 citations

    articleSenior author

    Abstract Accelerated life testing (ALT) is commonly used to obtain reliability information about a product in a timely manner. Several stress loading designs have been proposed and recent research interests have emerged concerning the development of equivalent ALT plans. Step‐stress ALT (SSALT) is one of the most commonly used stress loadings because it usually shortens the test duration and reduces the number of required test units. This article considers two fundamental questions when designing a SSALT and provides formal proofs in answer to each. Namely: (1) can a simple SSALT be designed so that it is equivalent to other stress loading designs? (2) when optimizing a multilevel SSALT, does it degenerate to a simple SSALT plan? The answers to both queries, under certain reasonable model assumptions, are shown to be a qualified YES. In addition, we provide an argument to support the rationale of a common practice in designing a SSALT, that is, setting the higher stress level as high as possible in a SSALT plan. © 2012 Wiley Periodicals, Inc. Naval Research Logistics, 2013

  • Step-stress accelerated life tests: a proportional hazards–based non-parametric model

    IIE Transactions · 2012-06-14 · 20 citations

    articleSenior author

    Using data from a simple step-stress accelerated life test procedure, a non-parametric proportional hazards model is proposed for obtaining upper confidence bounds for the cumulative failure probability of a product under normal use conditions. The approach is non-parametric in the sense that most of the functions involved in the model do not assume any specific forms, except for certain verifiable conditions. Test statistics are introduced to verify assumptions about the model and to test the goodness of fit of the proposed model to the data. A numerical example, using data simulated from the lifetime distribution of an existing parametric study on metal-oxide semiconductor capacitors, is used to illustrate the proposed methods. Discussions on how to determine the optimal stress levels and sample size are also given.

  • Methods for identifying influential variables in an out-of-control multivariate normal process

    Statistica Sinica · 2011-07-27 · 1 citations

    articleSenior author

    Hotelling's T 2 is a well-known statistic for testing the mean vector of a multivariate normal distribution. Control charts based on T 2 have been widely used in statistical process control for monitoring a multivariate process. Although it is a powerful tool, the T 2 statistic has a practical problem, namely, that a significant T 2 -value that normally signals an overall out-of-control condition in the process mean vector does not provide direct information about which variable or group of variables may have caused this out-of-control condition. We propose a diagnostic method to identify the influential variable(s) for cases with and without a speci- fied out-of-control mean vector. Our approach, based on the likelihood principle, computes the conditional likelihood of a variable or sub-group of variables causing or not causing the overall out-of-control condition. Unlike many existing meth- ods, our method assumes that an out-of-control condition already exists; hence, all conditional likelihoods in this paper are based on non-central distributions of the monitoring/testing statistics. By comparing these conditional likelihoods, we iden- tify the influential variable(s). We use an example from the literature to illustrate our method and to demonstrate its effectiveness.

Frequent coauthors

  • Robert Plante

    Purdue University West Lafayette

    12 shared
  • Herbert Moskowitz

    Purdue University West Lafayette

    12 shared
  • Kwei Tang

    National Chengchi University

    12 shared
  • Weijia Wang

    Emory University

    4 shared
  • Sheng‐Tsaing Tseng

    National Tsing Hua University

    3 shared
  • Regina Y. Liu

    Rutgers, The State University of New Jersey

    2 shared
  • Peter Sing-Lai Lam

    PAREXEL International (United States)

    2 shared
  • Suresh Chand

    Purdue University West Lafayette

    2 shared
  • Resume-aware match score
  • Save to shortlist
  • AI-drafted outreach

See your match with Jen Tang

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

  • Free to start
  • No credit card
  • 30-second signup