Resume-aware faculty matching

Find professors who actually fit you

Upload your resume. Four AI agents analyze your background, rank the faculty who fit, inspect their recent research, and help you draft outreach — grounded in their actual work, not templates.

Free to startNo credit cardCancel anytime
Top matches Balanced preset
Dr. Sarah Chen
Stanford · Interpretability · NLP
91
Dr. Marcus Holloway
MIT · Robotics · RL
84
Dr. Aisha Okonkwo
CMU · Fairness · HCI
82
Nova · Professor Researcher · re-ranking top 20…
Youakim Badr

Youakim Badr

· Professor of Data Analytics and Artificial IntelligenceVerified

Pennsylvania State University · Artificial Intelligence

Active 2001–2026

h-index21
Citations2.4k
Papers24445 last 5y
Funding
See your match with Youakim Badr — sign in to PhdFit.Sign in

About

Dr. Youakim Badr is a Professor of Data Analytics and Artificial Intelligence at The Pennsylvania State University, Great Valley School of Graduate Professional Studies. He serves as Professor-in-Charge of the Master of Artificial Intelligence program across Penn State Great Valley and World Campus. He is also the Founding Director of the Trustworthy Intelligence and Thinking Machine Lab (THINK Lab) and a Research Fellow with Penn State’s Center for Socially Responsible Artificial Intelligence and Clinical. Dr. Badr received his Ph.D. in Computer Science from the National Institute of Applied Sciences of Lyon (INSA-Lyon), France, in 2003, and held a tenured faculty appointment at INSA-Lyon before joining Penn State. His research advances trustworthy, secure, scalable, and composable AI systems, with emphasis on agentic AI, AI analytics systems, and trustworthy AI. He has advised and co-advised more than 20 doctoral students and numerous graduate students internationally, leading or contributing to funded research projects supported by agencies and partners in the United States, France, Europe, and industry. Dr. Badr has played a significant role in designing, launching, and scaling Penn State’s Master of Artificial Intelligence program, developing multiple stackable certificates and graduate AI courses, and expanding interdisciplinary AI pathways in collaboration with programs in data analytics, software engineering, computer science, and business. He has authored or co-authored over 150 peer-reviewed publications and has received multiple awards for research, teaching, innovation, service, and student engagement from Penn State and the French Ministry of Higher Education and Research. His professional activities include serving as a court-appointed scientific expert in artificial intelligence, evaluating AI and data-intensive systems for national and international funding panels, and participating actively in professional communities such as ACM, EDUCAUSE, INFORMS, Linux Foundation AI & Data, and the Agentic AI Foundation.

Research topics

  • Computer Science
  • Artificial Intelligence
  • Machine Learning
  • Natural Language Processing
  • Statistics
  • Control engineering
  • Engineering
  • Medicine
  • Surgery
  • Mathematics
  • Mechanical engineering
  • World Wide Web
  • Simulation
  • Automotive engineering
  • Electrical engineering

Selected publications

  • ParaKab – Many Languages, One Kabyle: A Multilingual Parallel Corpus for a Low-Resource Language

    Open MIND · 2026-03-05

    datasetSenior author

    Description of the Dataset This dataset consists of three parallel corpora involving the Kabyle language, covering the following language pairs: Kabyle-Arabic Kabyle-French Kabyle-English Together, these corpora comprise approximately one million (1M) aligned sentence pairs, forming a multilingual Kabyle-centric parallel dataset. The resource is intended for research in natural language processing (NLP), particularly for low-resource languages (LRLs), machine translation, and cross-linguistic studies. The data were compiled from multiple sources, primarily from publicly available resources distributed via the OPUS platform, as well as selected public websites. All texts underwent a rigorous pipeline including dataset selection, cleaning, normalization, alignment, verification, and deduplication, resulting in a ready-to-use dataset for research and machine learning applications. Academic Context This dataset was developed as part of a university research project on low-resource languages, focusing on Kabyle, a northern Tamazight (Berber) language. The project aims to contribute to the digital inclusion of low-resource languages and to support their integration into modern NLP systems. Licensing and Data Rights Each sentence pair in the dataset is linked to its original data source and, where applicable, its corresponding license or rights holder, ensuring transparency and compliance with source-specific terms of use. For the publication of this corpus, a general license is provided for the dataset as a whole. Users are responsible for respecting the original licenses and usage conditions associated with each source when reusing the data.

  • Beyond All-Reduce: Event-Driven Model Parallelism Without Collective Communication Primitives (EBD2N)

    Research Square · 2026-03-05

    preprintOpen access
  • ParaKab – Many Languages, One Kabyle: A Multilingual Parallel Corpus for a Low-Resource Language

    Zenodo (CERN European Organization for Nuclear Research) · 2026-03-05

    datasetOpen accessSenior author

    Description of the Dataset This dataset consists of three parallel corpora involving the Kabyle language, covering the following language pairs: Kabyle-Arabic Kabyle-French Kabyle-English Together, these corpora comprise approximately one million (1M) aligned sentence pairs, forming a multilingual Kabyle-centric parallel dataset. The resource is intended for research in natural language processing (NLP), particularly for low-resource languages (LRLs), machine translation, and cross-linguistic studies. The data were compiled from multiple sources, primarily from publicly available resources distributed via the OPUS platform, as well as selected public websites. All texts underwent a rigorous pipeline including dataset selection, cleaning, normalization, alignment, verification, and deduplication, resulting in a ready-to-use dataset for research and machine learning applications. Academic Context This dataset was developed as part of a university research project on low-resource languages, focusing on Kabyle, a northern Tamazight (Berber) language. The project aims to contribute to the digital inclusion of low-resource languages and to support their integration into modern NLP systems. Licensing and Data Rights Each sentence pair in the dataset is linked to its original data source and, where applicable, its corresponding license or rights holder, ensuring transparency and compliance with source-specific terms of use. For the publication of this corpus, a general license is provided for the dataset as a whole. Users are responsible for respecting the original licenses and usage conditions associated with each source when reusing the data.

  • Editorial: Journal of Cyber Security and Risk Auditing

    Journal of Cyber Security and Risk Auditing · 2025-05-01

    editorialOpen access1st authorCorresponding

    Dear Readers, It is with great pleasure that we introduce to you our upcoming journal, "Journal of Cyber Security and Risk Auditing." This journal is dedicated to exploring the advancements in the field of cybersecurity and providing a platform for researchers and scholars to exchange ideas, fostering progress in the area of cybersecurity and risk auditing. On behalf of the editorial team, I extend our heartfelt gratitude and a warm welcome to the scholars, experts, researchers, and readers who support and follow our journal. Purpose of the Journal The Journal of Cyber Security and Risk Auditing aims to promote the development of cybersecurity fields, enhance the research level of cybersecurity technologies, and strengthen academic exchanges on an international scale. We are committed to building an open, inclusive, and innovative platform for researchers in the field of cybersecurity to present their findings, share experiences, and exchange ideas.

  • Federated Learning in Healthcare: A Bibliometric Analysis of Privacy, Security, and Adversarial Threats (2021-2024)

    Shifra. · 2025-01-17 · 28 citations

    articleOpen access

    Federated Learning (FL) has rapidly emerged as a transformative machine learning approach, enabling healthcare institutions to collaboratively build predictive models without compromising patient data privacy. As healthcare increasingly adopts digital technologies, federated learning offers promising solutions to critical issues such as data privacy, security, data poisoning, and adversarial attacks. Despite the recognized potential of FL, significant gaps persist in existing research, particularly concerning comprehensive security frameworks and practical healthcare applications. This bibliometric analysis systematically explores the research landscape from 2021 to 2024, explicitly focusing on data privacy, security threats, and adversarial attacks within federated learning in healthcare. Utilizing bibliometric data from the Scopus database, the study identifies key thematic trends, evaluates global collaborative networks, and assesses contributions from leading institutions and countries. Findings reveal rapidly growing scholarly interest, robust international collaboration, and notable institutional contributions, with a specific emphasis on privacy-preserving techniques, healthcare-specific applications, and emerging technologies such as blockchain and edge computing. The analysis also highlights critical limitations due to incomplete bibliographic metadata. This research provides a comprehensive understanding of current trends and identifies future directions to enhance the security and privacy framework of federated learning in healthcare.

  • Leveraging Graph Neural Networks for Attack Detection in IoT Systems

    Lecture notes in computer science · 2025-01-01

    book-chapterOpen access
  • Enhancing Trust in Central Differential Privacy Using zk-SNARKs and Cryptographic Hashes

    Lecture notes on data engineering and communications technologies · 2025-01-01 · 1 citations

    book-chapterOpen access
  • A Survey on Acoustic Side-Channel Attacks: An Artificial Intelligence Perspective

    Journal of Cybersecurity and Privacy · 2025-12-29

    articleOpen accessSenior author

    Acoustic Side-Channel Attacks (ASCAs) exploit the sound produced by keyboards and other devices to infer sensitive information without breaching software or network defenses. Recent advances in deep learning, large language models, and signal processing have greatly expanded the feasibility and accuracy of these attacks. To clarify the evolving threat landscape, this survey systematically reviews ASCA research published between January 2020 and February 2025. We categorize modern ASCA methods into three levels of text reconstruction—individual keystrokes, short text (words/phrases), and long-text regeneration— and analyze the signal processing, machine learning, and language-model decoding techniques that enable them. We also evaluate how environmental factors such as microphone placement, ambient noise, and keyboard design influence attack performance, and we examine the challenges of generalizing laboratory-trained models to real-world settings. This survey makes three primary contributions: (1) it provides the first structured taxonomy of ASCAs based on text generation granularity and decoding methodology; (2) it synthesizes cross-study evidence on environmental and hardware factors that fundamentally shape ASCA performance; and (3) it consolidates emerging countermeasures, including Generative Adversarial Network-based noise masking, cryptographic defenses, and environmental mitigation, while identifying open research gaps and future threats posed by voice-enabled IoT and prospective quantum side-channels. Together, these insights underscore the need for interdisciplinary, multi-layered defenses against rapidly advancing ASCA techniques.

  • Digital Signature Quantification in the Bitcoin Blockchain: A Statistical Approach

    2025-06-18

    article

    Digital signatures are crucial for blockchain security, ensuring transaction authenticity, integrity, and nonrepudiation. This reliance on secure digital signatures also extends to emerging blockchain applications like in the Internet of Vehicles (IoV) and the Internet of Devices (IoD). However, the emergence of Post-Quantum Cryptography (PQC) poses a potential threat to blockchain performance by significantly increasing the size of digital signatures and public keys. This paper examines the number of signatures per block in the Bitcoin network using a real-world data-driven analysis. We propose an efficient signature counting algorithm that processes transaction data and accounts for all acceptable digital signature formats. To evaluate the methodology and the quality of the data, we apply the Shapiro-Wilk test for normality and use the Coefficient of Variation (CV) to assess data distribution and variability. This research advances blockchain analytics by offering a systematic approach to quantifying digital signatures.

  • SLIM: Stateless-Based Lightweight Sharding Mechanism for Secure Data Interchange in Blockchain-Enabled Internet of Vehicles

    IEEE Transactions on Intelligent Transportation Systems · 2025-12-24

    article

    The emergence of Blockchain-enabled Internet of Vehicles stands as a critical method for secure data exchange between vehicles and traffic infrastructures. Yet, significant challenges persist, particularly in terms of limited fault tolerance and constraints associated with storage and computational resources. In this study, we introduce the Stateless-based Lightweight Sharding Mechanism (SLIM) for enhanced secure data exchange. This mechanism is built on three key strategies: 1) SLIM incorporates a robust sharding protocol which includes identity establishment and verification, shard formation, and intra-shard consensus to mitigate the impact of Byzantine nodes within each shard. This protocol is designed to reduce the influence of malicious nodes in each shard, thereby improving fault tolerance and security. 2) To address storage challenges, we employ a stateless verification algorithm to allow service nodes in the Internet of Vehicles to authenticate vehicular data transactions without the need to store the entire historical blockchain data. 3) SLIM also integrates a Stackelberg game model for efficient resource distribution in collaborative networks. By offloading the mining process and data storage to the cloud and allowing roadside units to adjust their storage and computing strategies through the game model, the computing resource requirement of roadside units is thus substantially reduced, leading to optimized revenue generation. The security effectiveness of SLIM is theoretically proven. The simulations and experimental results demonstrate that SLIM’s block size is significantly smaller (66.9 times less) than Ethereum’s, and its block verification time is 67% faster compared to conventional stateless blockchains.

Frequent coauthors

  • Frédérique Biennier

    66 shared
  • Arthur Gatouillat

    Laboratoire d'Informatique en Images et Systèmes d'Information

    44 shared
  • Maroun Abi Assaf

    40 shared
  • Youssef Amghar

    Institut National des Sciences Appliquées de Lyon

    36 shared
  • Bertrand Massot

    Centre National de la Recherche Scientifique

    27 shared
  • Zakaria Maamar

    Qatar Science and Technology Park

    21 shared
  • Xiaoyang Zhu

    Institut National des Sciences Appliquées de Lyon

    20 shared
  • Robin G. Qiu

    Pennsylvania State University

    18 shared

Labs

  • Trustworthy Intelligence and Thinking Machine Lab (THINK Lab)PI

Awards & honors

  • 2022-2023 Distinguished Research and Scholarship Award, Penn…
  • 2022-2023 Excellence in Teaching Award, Penn State Great Val…
  • 2022 Arthur L. Glenn Award for Excellence in Student Engagem…
  • 2020-2021 Award for Faculty Service, Penn State Great Valley…
  • 2019-2020 Arthur L. Glenn Award for Faculty Innovation, Penn…
  • Resume-aware match score
  • Save to shortlist
  • AI-drafted outreach

See your match with Youakim Badr

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

  • Free to start
  • No credit card
  • 30-second signup