Resume-aware faculty matching

Find professors who actually fit you

Upload your resume. Four AI agents analyze your background, rank the faculty who fit, inspect their recent research, and help you draft outreach — grounded in their actual work, not templates.

Free to startNo credit cardCancel anytime
Top matches Balanced preset
Dr. Sarah Chen
Stanford · Interpretability · NLP
91
Dr. Marcus Holloway
MIT · Robotics · RL
84
Dr. Aisha Okonkwo
CMU · Fairness · HCI
82
Nova · Professor Researcher · re-ranking top 20…
Leman Akoglu

Leman Akoglu

· Assistant Professor

Carnegie Mellon University · Heinz College

Active 2006–2025

h-index45
Citations8.8k
Papers22087 last 5y
Funding$1.7M
See your match with Leman Akoglu — sign in to PhdFit.Sign in

About

Leman Akoglu is the Heinz College Dean's Associate Professor of Information Systems at Carnegie Mellon University, holding a tenured position. He directs the Data Analytics Techniques Algorithms (DATA) Lab at Heinz College. His research interests broadly encompass data mining, graph mining, machine learning, and knowledge discovery, with a specific focus on anomalies—identifying and characterizing 'what stands out' in large-scale, time-varying, multi-modal data sources through scalable computational methods. Akoglu holds a Ph.D. in Computer Science from Carnegie Mellon University, obtained in 2012, and a B.S. in Computer Science from Bilkent University, completed in 2007. He also holds courtesy appointments at the Machine Learning Department and the Computer Science Department of the School of Computer Science. His work has led to numerous contributions in anomaly detection, outlier detection, hyperparameter sensitivity analysis, graph neural networks, and the development of foundation models for various applications, including healthcare, finance, and social networks. Akoglu is actively involved in research collaborations, keynote speaking engagements, and organizing workshops and conferences, advancing the field of data science and machine learning.

Research topics

  • Computer Science
  • Artificial Intelligence
  • Computer Security
  • Machine Learning
  • Data Mining
  • Mathematics
  • Database
  • Data science
  • Human–computer interaction
  • Theoretical computer science
  • World Wide Web

Selected publications

  • Trajectory Anomaly Detection with By-Design Complementary Detectors

    Society for Industrial and Applied Mathematics eBooks · 2025-01-01 · 2 citations

    book-chapterSenior author

    Trajectory anomaly detection is critical across a wide range of applications, from traffic control, and wildlife conservation, to public transportation optimization. However, detecting anomalies in trajectory data is challenging due to the diverse nature of anomalies. In this paper, we propose CETrajAD, an ensemble method for trajectory anomaly detection that integrates complementary detectors, each targeting different aspects of trajectory anomalies. Our approach leverages three types of trajectory embeddings—Route, Speed, and Shape—that vary in their sensitivity to length, direction, shape, and speed, enabling the detection of diverse anomaly types. We combine detectors from both the embedding and input spaces and show how their complementary nature improves anomaly detection performance. Through theoretical analysis, we demonstrate the conditions when the proposed ensemble design outperforms traditional ensemble methods. Experiments on multiple real-world datasets, containing both simulated and ground-truth anomalies, show that the proposed model consistently outperforms existing baselines.

  • End-To-End Self-Tuning Self-Supervised Time Series Anomaly Detection

    Society for Industrial and Applied Mathematics eBooks · 2025-01-01 · 1 citations

    book-chapterSenior author

    Time series anomaly detection (TSAD) finds many applications such as monitoring environmental sensors, industry KPIs, patient biomarkers, etc. A two-fold challenge for TSAD is a versatile and unsupervised model that can detect various different types of time series anomalies (spikes, discontinuities, trend shifts, etc.) without any labeled data. Modern neural networks have outstanding ability in modeling complex time series. Self-supervised models in particular tackle unsupervised TSAD by transforming the input via various augmentations to create pseudo anomalies for training. However, their performance is sensitive to the choice of augmentation, which is hard to choose in practice, while there exists no effort in the literature on data augmentation tuning for TSAD without labels. Our work aims to fill this gap. We introduce TSAP for TSA “on autoPilot”, which can (self-)tune augmentation hyperparameters end-to-end. It stands on two key components: a differentiable augmentation architecture and an unsupervised validation loss to effectively assess the alignment between augmentation type and anomaly type. Case studies show TSAP’s ability to effectively select the (discrete) augmentation type and associated (continuous) hyperparameters. In turn, it outperforms established baselines, including SOTA self-supervised models, on diverse TSAD tasks exhibiting different anomaly types.

  • Can Machine Learning Target Health Care Fraud? Evidence From Medicare Hospitalizations

    Journal of Policy Analysis and Management · 2025-12-25 · 1 citations

    articleOpen accessSenior author

    The US spends more than $4 trillion per year on health care, largely conducted by private providers and reimbursed by insurers. A major concern in this system is overbilling and fraud by hospitals, who face incentives to misreport their claims to receive higher payments. In this work, we develop novel machine learning tools to identify hospitals that overbill insurers, which can be used to guide investigations and auditing of suspicious hospitals for both public and private health insurance systems. Using large-scale claims data from Medicare, the US federal health insurance program for the elderly and disabled, we identify patterns consistent with fraud among inpatient hospitalizations. Our proposed approach for fraud detection is fully unsupervised, not relying on any labeled training data, and is explainable to end users, providing interpretations for which diagnosis, procedure, and billing codes lead to hospitals being labeled suspicious. Using newly collected data from the Department of Justice on hospitals facing anti-fraud lawsuits, and case studies of suspicious hospitals, we validate our approach and findings. Our method provides a nearly 5-fold lift over random targeting of hospitals. We also perform a post-analysis to understand which hospital characteristics, not used for detection, are associated with suspiciousness.

  • Pard: Permutation-Invariant Autoregressive Diffusion for Graph Generation

    arXiv (Cornell University) · 2024-02-06 · 1 citations

    preprintOpen accessSenior author

    Graph generation has been dominated by autoregressive models due to their simplicity and effectiveness, despite their sensitivity to ordering. Yet diffusion models have garnered increasing attention, as they offer comparable performance while being permutation-invariant. Current graph diffusion models generate graphs in a one-shot fashion, but they require extra features and thousands of denoising steps to achieve optimal performance. We introduce PARD, a Permutation-invariant Auto Regressive Diffusion model that integrates diffusion models with autoregressive methods. PARD harnesses the effectiveness and efficiency of the autoregressive model while maintaining permutation invariance without ordering sensitivity. Specifically, we show that contrary to sets, elements in a graph are not entirely unordered and there is a unique partial order for nodes and edges. With this partial order, PARD generates a graph in a block-by-block, autoregressive fashion, where each block's probability is conditionally modeled by a shared diffusion model with an equivariant network. To ensure efficiency while being expressive, we further propose a higher-order graph transformer, which integrates transformer with PPGN. Like GPT, we extend the higher-order graph transformer to support parallel training of all blocks. Without any extra features, PARD achieves state-of-the-art performance on molecular and non-molecular datasets, and scales to large datasets like MOSES containing 1.9M molecules. Pard is open-sourced at https://github.com/LingxiaoShawn/Pard.

  • Outlier Detection Bias Busted: Understanding Sources of Algorithmic Bias through Data-centric Factors

    arXiv (Cornell University) · 2024-08-24

    preprintOpen accessSenior author

    The astonishing successes of ML have raised growing concern for the fairness of modern methods when deployed in real world settings. However, studies on fairness have mostly focused on supervised ML, while unsupervised outlier detection (OD), with numerous applications in finance, security, etc., have attracted little attention. While a few studies proposed fairness-enhanced OD algorithms, they remain agnostic to the underlying driving mechanisms or sources of unfairness. Even within the supervised ML literature, there exists debate on whether unfairness stems solely from algorithmic biases (i.e. design choices) or from the biases encoded in the data on which they are trained. To close this gap, this work aims to shed light on the possible sources of unfairness in OD by auditing detection models under different data-centric factors. By injecting various known biases into the input data -- as pertain to sample size disparity, under-representation, feature measurement noise, and group membership obfuscation -- we find that the OD algorithms under the study all exhibit fairness pitfalls, although differing in which types of data bias they are more susceptible to. Most notable of our study is to demonstrate that OD algorithm bias is not merely a data bias problem. A key realization is that the data properties that emerge from bias injection could as well be organic -- as pertain to natural group differences w.r.t. sparsity, base rate, variance, and multi-modality. Either natural or biased, such data properties can give rise to unfairness as they interact with certain algorithmic design choices.

  • Descriptive Kernel Convolution Network with Improved Random Walk Kernel

    2024-05-08 · 1 citations

    articleOpen accessSenior author

    Graph kernels used to be the dominant approach to feature engineering for structured data, which are superseded by modern GNNs as the former lacks learnability. Recently, a suite of Kernel Convolution Networks (KCNs) successfully revitalized graph kernels by introducing learnability, which convolves input with learnable hidden graphs using a certain graph kernel. The random walk kernel (RWK) has been used as the default kernel in many KCNs, gaining increasing attention. In this paper, we first revisit the RWK and its current usage in KCNs, revealing several shortcomings of the existing designs, and propose an improved graph kernel RWK^+, by introducing color-matching random walks and deriving its efficient computation. We then propose RWK^+ CN, a KCN that uses RWK^+ as the core kernel to learn descriptive graph features with an unsupervised objective, which can not be achieved by GNNs. Further, by unrolling RWK^+, we discover its connection with a regular GCN layer, and propose a novel GNN layer RWK^+ Conv. In the first part of experiments, we demonstrate the descriptive learning ability of RWK^+ CN with the improved random walk kernel RWK^+ on unsupervised pattern mining tasks; in the second part, we show the effectiveness of RWK^+ for a variety of KCN architectures and supervised graph learning tasks, and demonstrate the expressiveness of RWK^+ Conv layer, especially on the graph-level tasks. RWK^+ and RWK^+ Conv adapt to various real-world applications, including web applications such as bot detection in a web-scale Twitter social network, and community classification in Reddit social interaction networks.

  • Machine Learning in Finance

    2024-08-24

    article1st authorCorresponding

    This workshop aims to explore the intersection of Generative AI with the rich tapestry of financial data types, seeking to uncover new methodologies and techniques that can enhance predictive analytics, fraud detection, and customer insights across the sector. By harnessing these advancements in AI, we can pave the way to not only understand customer behavior but also anticipate their needs more effectively, leading to superior customer outcomes and more personalized services. Our objective is to shed light on the challenges and opportunities presented by the diverse data formats in finance. We aim to bridge the gap between the dominance of traditional models for tabular data analysis and the emerging potential of Generative AI to revolutionize the treatment of time series, click streams, and other unstructured data forms.

  • End-To-End Self-Tuning Self-Supervised Time Series Anomaly Detection

    arXiv (Cornell University) · 2024-04-03

    preprintOpen accessSenior author

    Time series anomaly detection (TSAD) finds many applications such as monitoring environmental sensors, industry KPIs, patient biomarkers, etc. A two-fold challenge for TSAD is a versatile and unsupervised model that can detect various different types of time series anomalies (spikes, discontinuities, trend shifts, etc.) without any labeled data. Modern neural networks have outstanding ability in modeling complex time series. Self-supervised models in particular tackle unsupervised TSAD by transforming the input via various augmentations to create pseudo anomalies for training. However, their performance is sensitive to the choice of augmentation, which is hard to choose in practice, while there exists no effort in the literature on data augmentation tuning for TSAD without labels. Our work aims to fill this gap. We introduce TSAP for TSA "on autoPilot", which can (self-)tune augmentation hyperparameters end-to-end. It stands on two key components: a differentiable augmentation architecture and an unsupervised validation loss to effectively assess the alignment between augmentation type and anomaly type. Case studies show TSAP's ability to effectively select the (discrete) augmentation type and associated (continuous) hyperparameters. In turn, it outperforms established baselines, including SOTA self-supervised models, on diverse TSAD tasks exhibiting different anomaly types.

  • FoMo-0D: A Foundation Model for Zero-shot Tabular Outlier Detection

    arXiv (Cornell University) · 2024-09-09

    preprintOpen accessSenior author

    Outlier detection (OD) has a vast literature as it finds numerous real-world applications. Being an unsupervised task, model selection is a key bottleneck for OD without label supervision. Despite a long list of available OD algorithms with tunable hyperparameters, the lack of systematic approaches for unsupervised algorithm and hyperparameter selection limits their effective use in practice. In this paper, we present FoMo-0D, a pre-trained Foundation Model for zero/0-shot OD on tabular data, which bypasses the hurdle of model selection altogether. Having been pre-trained on synthetic data, FoMo-0D can directly predict the (outlier/inlier) label of test samples without parameter fine-tuning -- requiring no labeled data, and no additional training or hyperparameter tuning when given a new task. Extensive experiments on 57 real-world datasets against 26 baselines show that FoMo-0D is highly competitive; outperforming the majority of the baselines with no statistically significant difference from the 2nd best method. Further, FoMo-0D is efficient in inference time requiring only 7.7 ms per sample on average, with at least 7x speed-up compared to previous methods. To facilitate future research, our implementations for data synthesis and pre-training as well as model checkpoints are openly available at https://github.com/A-Chicharito-S/FoMo-0D.

  • Descriptive Kernel Convolution Network with Improved Random Walk Kernel

    arXiv (Cornell University) · 2024-02-08

    preprintOpen accessSenior author

    Graph kernels used to be the dominant approach to feature engineering for structured data, which are superseded by modern GNNs as the former lacks learnability. Recently, a suite of Kernel Convolution Networks (KCNs) successfully revitalized graph kernels by introducing learnability, which convolves input with learnable hidden graphs using a certain graph kernel. The random walk kernel (RWK) has been used as the default kernel in many KCNs, gaining increasing attention. In this paper, we first revisit the RWK and its current usage in KCNs, revealing several shortcomings of the existing designs, and propose an improved graph kernel RWK+, by introducing color-matching random walks and deriving its efficient computation. We then propose RWK+CN, a KCN that uses RWK+ as the core kernel to learn descriptive graph features with an unsupervised objective, which can not be achieved by GNNs. Further, by unrolling RWK+, we discover its connection with a regular GCN layer, and propose a novel GNN layer RWK+Conv. In the first part of experiments, we demonstrate the descriptive learning ability of RWK+CN with the improved random walk kernel RWK+ on unsupervised pattern mining tasks; in the second part, we show the effectiveness of RWK+ for a variety of KCN architectures and supervised graph learning tasks, and demonstrate the expressiveness of RWK+Conv layer, especially on the graph-level tasks. RWK+ and RWK+Conv adapt to various real-world applications, including web applications such as bot detection in a web-scale Twitter social network, and community classification in Reddit social interaction networks.

Recent grants

Frequent coauthors

Education

  • Ph.D., Computer Science

    Carnegie Mellon University

    2004
  • M.S., Computer Science

    Carnegie Mellon University

    2000
  • B.S., Computer Engineering

    Middle East Technical University

    1996

Awards & honors

  • Heinz College Dean's Professor for Feb 2019-2022
  • Best Research Paper Award, SIAM SDM 2019
  • Best Student Machine Learning Paper Runner-up Award, ECML PK…
  • NSF CAREER Award, 2015-2020
  • Best Research Paper Runner-up Award, SIAM SDM 2016
  • Resume-aware match score
  • Save to shortlist
  • AI-drafted outreach

See your match with Leman Akoglu

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

  • Free to start
  • No credit card
  • 30-second signup