Resume-aware faculty matching

Find professors who actually fit you

Upload your resume. Four AI agents analyze your background, rank the faculty who fit, inspect their recent research, and help you draft outreach — grounded in their actual work, not templates.

Free to startNo credit cardCancel anytime
Top matches Balanced preset
Dr. Sarah Chen
Stanford · Interpretability · NLP
91
Dr. Marcus Holloway
MIT · Robotics · RL
84
Dr. Aisha Okonkwo
CMU · Fairness · HCI
82
Nova · Professor Researcher · re-ranking top 20…
Marcus Botacin

Marcus Botacin

· Assistant Professor, Computer Science & EngineeringVerified

Texas A&M University · Computer Science & Engineering

Active 2014–2025

h-index12
Citations362
Papers6739 last 5y
Funding$531k1 active
See your match with Marcus Botacin — sign in to PhdFit.Sign in

About

Marcus Botacin is a Computer Security Researcher with a focus on malware analysis, evasion, and detection, sandbox development, antivirus operation, hardware-assisted security solutions, and reverse engineering.

Research topics

  • Computer Science
  • Computer Security
  • Artificial Intelligence
  • Data science
  • Machine Learning
  • Data Mining
  • Software engineering
  • Programming language

Selected publications

  • Towards Explainable Drift Detection and Early Retrain in ML-Based Malware Detection Pipelines

    Lecture notes in computer science · 2025-01-01

    book-chapterSenior author
  • ML-Based Behavioral Malware Detection Is Far From a Solved Problem

    2025-04-09 · 3 citations

    articleOpen access

    Malware detection is a ubiquitous application of Machine Learning (ML) in security. In behavioral malware analysis, the detector relies on features extracted from program execution traces. The research literature has focused on detectors trained with features collected from sandbox environments and evaluated on samples also analyzed in a sandbox. However, in deployment, a malware detector at endpoint hosts often must rely on traces captured from endpoint hosts, not from a sandbox. Thus, there is a gap between the literature and real-world needs. We present the first measurement study of the performance of ML-based malware detectors at real-world endpoints. Leveraging a dataset of sandbox traces and a dataset of in-the-wild program traces, we evaluate two scenarios: (i) an endpoint detector trained on sandbox traces (convenient and easy to train), and (ii) an endpoint detector trained on endpoint traces (more challenging to train, since we need to collect telemetry data). We discover a wide gap between the performance as measured using prior evaluation methods in the literature—over 90%—vs. expected performance in endpoint detection—about 20% (scenario (i)) to 50% (scenario (ii)). We characterize the ML challenges that arise in this domain and contribute to this gap, including label noise, distribution shift, and spurious features. Moreover, we show several techniques that achieve 5–30% relative performance improvements over the baselines. Our evidence suggests that applying detectors trained on sandbox data to endpoint detection is challenging. The most promising direction is training detectors directly on endpoint data, which marks a departure from current practice. To promote progress, we will facilitate researchers to perform realistic detector evaluations against our real-world dataset.

  • Making Acoustic Side-Channel Attacks on Noisy Keyboards Viable with LLM-Assisted Spectrograms' "Typo" Correction

    ArXiv.org · 2025-04-15

    preprintOpen accessSenior author

    The large integration of microphones into devices increases the opportunities for Acoustic Side-Channel Attacks (ASCAs), as these can be used to capture keystrokes' audio signals that might reveal sensitive information. However, the current State-Of-The-Art (SOTA) models for ASCAs, including Convolutional Neural Networks (CNNs) and hybrid models, such as CoAtNet, still exhibit limited robustness under realistic noisy conditions. Solving this problem requires either: (i) an increased model's capacity to infer contextual information from longer sequences, allowing the model to learn that an initially noisily typed word is the same as a futurely collected non-noisy word, or (ii) an approach to fix misidentified information from the contexts, as one does not type random words, but the ones that best fit the conversation context. In this paper, we demonstrate that both strategies are viable and complementary solutions for making ASCAs practical. We observed that no existing solution leverages advanced transformer architectures' power for these tasks and propose that: (i) Visual Transformers (VTs) are the candidate solutions for capturing long-term contextual information and (ii) transformer-powered Large Language Models (LLMs) are the candidate solutions to fix the ``typos'' (mispredictions) the model might make. Thus, we here present the first-of-its-kind approach that integrates VTs and LLMs for ASCAs. We first show that VTs achieve SOTA performance in classifying keystrokes when compared to the previous CNN benchmark. Second, we demonstrate that LLMs can mitigate the impact of real-world noise. Evaluations on the natural sentences revealed that: (i) incorporating LLMs (e.g., GPT-4o) in our ASCA pipeline boosts the performance of error-correction tasks; and (ii) the comparable performance can be attained by a lightweight, fine-tuned smaller LLM (67 times smaller than GPT-4o), using...

  • Cross-Regional Malware Detection via Model Distilling and Federated Learning

    2024-09-29 · 4 citations

    articleOpen access1st authorCorresponding

    Machine Learning (ML) is a key part of modern malware detection pipelines, but its application is not straightforward. It involves multiple practical challenges that are frequently unaddressed by the literature works. A key challenge is the heterogeneity of scenarios. Antivirus (AV) companies for instance operate under different performance constraints in the backend and in the endpoint, and with a diversity of datasets according to the country they operate in. In this paper, we evaluate the impact of these heterogeneous aspects by developing a classification pipeline for 3 datasets of 10K malware samples each collected by an AV company in the USA, Brazil, and Japan in the same period. We characterize the different requirements for these datasets and we show that a different number of features is required to reach the optimal detection rate in each scenario. We show that a global model combining the three datasets increases the detection of the three individual datasets. We propose using Federated Learning (FL) to build the global model and a distilling process to generate the local versions. We order the samples temporally to show that although retraining on concept drift detection helps recover the detection rate, only a FL approach can increase the detection rate.

  • What do malware analysts want from academia? A survey on the state-of-the-practice to guide research developments

    2024-09-29

    articleOpen access1st authorCorresponding

    Malware analysis tasks are as fundamental for modern cybersecurity as they are challenging to perform. More than depending on any tool capability, malware analysis tasks depend on human analysts’ abilities, experiences, and practices when using the tools. Academic research has traditionally been focused on producing solutions to overcome malware analysis technical challenges, but are these solutions adopted in practice by malware analysts? Are these solutions useful? If not, how can the academic community improve its practices to foster adoption and cause a greater impact? To answer these questions, we surveyed 21 professional malware analysts working in different companies, from CSIRTs to AV companies, to hear their opinions about existing tools, practices, and the challenges they face in their daily tasks. In 31 questions, we cover a broad range of aspects, from the number of observed malware variants to the use of public sandboxes and the tools the analysts would like to exist to make their lives easier. We aim to bridge the gap between academic developments and malware practices. To do so, on the one hand, we suggest to the analysts the solutions proposed in the literature that could be integrated into their practices. On the other hand, we also point out to the academic community possible future directions to bridge existing development gaps that significantly affect malware analysis practices.

  • Fuzzing and Symbolic Execution for Multipath Malware Tracing: Bridging Theory and Practice via Survey and Experiments

    Digital Threats Research and Practice · 2024-10-11 · 1 citations

    articleOpen access1st authorCorresponding

    In real life, distinct runs of the same artifact lead to the exploration of different paths, due to either system’s natural randomness or malicious constructions. These variations might completely change execution outcomes (extreme case). Thus, to analyze malware beyond theoretical models, we must consider the execution of multiple paths. The academic literature presents many approaches for multipath analysis (e.g., fuzzing, symbolic, and concolic executions), but it still fails to answer What’s the current state of multipath malware tracing? This work aims to answer this question and also to point out What developments are still required to make them practical? Thus, we present a literature survey and perform experiments to bridge theory and practice. Our results show that (i) natural variation is frequent; (ii) fuzzing helps to discover more paths; (iii) fuzzing can be guided to increase coverage; (iv) forced execution maximizes path discovery rates; (v) pure symbolic execution is impractical, and (vi) concolic execution is promising but still requires further developments.

  • On the uniqueness of AntiVirus labels: How many labels do we need to fingerprint an AV?

    Journal of Computer Virology and Hacking Techniques · 2024-11-22

    article1st authorCorresponding
  • The Use of the DWARF Debugging Format for the Identification of Potentially Unwanted Applications (PUAs) in WebAssembly Binaries

    2024-01-01 · 1 citations

    articleOpen access
  • Towards more realistic evaluations: The impact of label delays in malware detection pipelines

    Computers & Security · 2024-09-19 · 5 citations

    article1st authorCorresponding
  • Uma Estratégia Dinâmica para a Detecção de Anomalias em Binários WebAssembly

    2023-09-18

    articleOpen access

    WebAssembly é um formato binário de baixo nível, que oferece um alvo de compilação para linguagens de alto nível. Oferecendo mais segurança para os usuários na Web, com um formato de instruções binárias o WebAssembly é suportado por mais de 95% dos navegadores Web. No entanto, o crescimento no uso do WebAssembly trouxe preocupações em relação à sua segurança e seu possível uso de forma maliciosa. Dado que o WebAssemby é um formato de instruções de baixo nível, torna-se essencial a identificação do propósito dos códigos desenvolvidos, por meio da extração de suas características. O uso de WebAssembly para ataques de cryptojacking e ofuscação de códigos maliciosos é frequentemente observado. Nesse contexto, esse trabalho apresenta uma estratégia para a identificação de anomalias em binários WebAssembly, através de extração de características e análise estática. A estratégia proposta neste artigo alcançou um f1score de 99.3%, evidenciando seu potencial.

Recent grants

Frequent coauthors

  • André Grégio

    54 shared
  • Paulo Lício de Geus

    39 shared
  • Fabrício Ceschin

    Georgia Institute of Technology

    14 shared
  • Heitor Murilo Gomes

    8 shared
  • Lucas Galante

    7 shared
  • Daniela S Oliveira

    University of Florida

    5 shared
  • Ruimin Sun

    Florida International University

    4 shared
  • Luiz S. Oliveira

    4 shared

Labs

Education

  • Ph.D., Computer Science

    Federal University of Paraná (UFPR-Brazil)

    2021
  • M.S., Computer Science

    University of Campinas (UNICAMP-Brazil)

    2017
  • B.S., Computer Engineering

    University of Campinas (UNICAMP-Brazil)

    2015

Awards & honors

  • Outstanding Alumnus - DInf/UFPR - 2025
  • Top-3 Best PhD Thesis in Security - Brazilian Computer Socie…
  • Best PhD Thesis - Informatics Department/UFPR - 2022
  • Best Master Dissertation in Security - 1st place - Brazilian…
  • Best Master Dissertation - Institute of Computing/UNICAMP -…
  • Resume-aware match score
  • Save to shortlist
  • AI-drafted outreach

See your match with Marcus Botacin

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

  • Free to start
  • No credit card
  • 30-second signup