
Alina Oprea
VerifiedNortheastern University · Artificial Intelligence and Data Science
Active 2003–2025
About
Alina Oprea is a professor in the Khoury College of Computer Sciences at Northeastern University, based in Boston. Her research interests include extracting meaningful intelligence from data sources for security applications, designing machine learning techniques to predict the behavior of sophisticated attackers, and protecting cloud infrastructures against emerging threats. She co-directs the Network and Distributed Systems Security Lab, which focuses on building distributed systems and network protocols that achieve security, availability, and performance. Before joining Khoury College, Oprea was a research scientist at RSA Laboratories, where she studied cloud security, applied cryptography, foundations of cybersecurity, and security analytics. She has co-authored numerous journal and conference papers, participated in many technical program committees, and is a co-inventor on 20 patents. She is also an associate editor for the ACM Transactions on Privacy and Security journal. Her notable achievements include receiving the Best Paper Award at the 2005 Network and Distributed System Security Conference and the Technology Review TR35 award in 2011 for her research in cloud security.
Research signals
Five dimensions sourced from public faculty / publication signals. Sign in to compare against your own profile and see your match score.
Research topics
- Computer Science
- Data Mining
- Machine Learning
- Artificial Intelligence
- Computer Security
- Engineering
- Internet privacy
- Business
- World Wide Web
- Computer network
- Software engineering
- Accounting
Selected publications
Cascading Adversarial Bias from Injection to Distillation in Language Models
2025-11-19
articleOpen accessSenior authorModel distillation has become essential for creating deployable language models, but their widespread deployment raises concerns about about their resilience to adversarial manipulation. This paper investigates how adversaries can inject subtle biases into teacher models through minimal data poisoning during training, which propagates to a smaller distilled student model and becomes significantly amplified. We identify two propagation modes: Untargeted (affecting multiple tasks) and Targeted (focusing on specific task while maintaining normal behavior elsewhere). With only 25 poisoned samples (0.25% poisoning rate), student models generate biased responses 76.9% of the time in targeted scenarios versus 69.4% in teachers, while untargeted propagation shows 5.7X-29.2X higher adversarial bias rate in students on unseen tasks. We validate across six bias types (targeted advertisement, phishing link, narrative manipulations, insecure coding practices), various distillation methods, and text/code generation modalities. Current defense mechanisms—including perplexity filtering, bias detection systems, and LLM-based autoraters—prove inadequate against these attacks. We propose practical design principles for building effective adversarial bias mitigation strategies to address this threat vector.
Computer Networks · 2025-09-24
articleOpen accessThe development of Open Radio Access Network (RAN) cellular systems is being propelled by the integration of Artificial Intelligence (AI) techniques. While AI can enhance network performance, it expands the attack surface of the RAN. For instance, the need for datasets to train AI algorithms and the use of open interface to retrieve data in real time paves the way to data tampering during both training and inference phases. In this work, we propose MalO-RAN, a framework to evaluate the impact of data poisoning on O-RAN intelligent applications. We focus on AI-based xApps taking control decisions via Deep Reinforcement Learning (DRL), and investigate backdoor attacks, where tampered data is added to training datasets to include a backdoor in the final model that can be used by the attacker to trigger potentially harmful or inefficient pre-defined control decisions. We leverage an extensive O-RAN dataset collected on the Colosseum network emulator and show how an attacker may tamper with the training of AI models embedded in xApps, with the goal of favoring specific tenants after the application deployment on the network. We experimentally evaluate the impact of the SleeperNets and TrojDRL attacks and show that backdoor attacks achieve up to a 0.9 attack success rate. Moreover, we demonstrate the impact of these attacks on a live O-RAN deployment implemented on Colosseum, where we instantiate the xApps poisoned with MalO-RAN on an O-RAN-compliant Near-real-time RAN Intelligent Controller (RIC). Results show that these attacks cause an average network performance degradation of 87%.
PoolFlip: A Multi-agent Reinforcement Learning Security Environment for Cyber Defense
Lecture notes in computer science · 2025-10-11 · 1 citations
book-chapterR1dacted: Investigating Local Censorship in DeepSeek's R1 Language Model
ArXiv.org · 2025-05-19
preprintOpen accessDeepSeek recently released R1, a high-performing large language model (LLM) optimized for reasoning tasks. Despite its efficient training pipeline, R1 achieves competitive performance, even surpassing leading reasoning models like OpenAI's o1 on several benchmarks. However, emerging reports suggest that R1 refuses to answer certain prompts related to politically sensitive topics in China. While existing LLMs often implement safeguards to avoid generating harmful or offensive outputs, R1 represents a notable shift - exhibiting censorship-like behavior on politically charged queries. In this paper, we investigate this phenomenon by first introducing a large-scale set of heavily curated prompts that get censored by R1, covering a range of politically sensitive topics, but are not censored by other models. We then conduct a comprehensive analysis of R1's censorship patterns, examining their consistency, triggers, and variations across topics, prompt phrasing, and context. Beyond English-language queries, we explore censorship behavior in other languages. We also investigate the transferability of censorship to models distilled from the R1 language model. Finally, we propose techniques for bypassing or removing this censorship. Our findings reveal possible additional censorship integration likely shaped by design choices during training or alignment, raising concerns about transparency, bias, and governance in language model deployment.
Cascading Adversarial Bias from Injection to Distillation in Language Models
ArXiv.org · 2025-05-30
preprintOpen accessSenior authorModel distillation has become essential for creating smaller, deployable language models that retain larger system capabilities. However, widespread deployment raises concerns about resilience to adversarial manipulation. This paper investigates vulnerability of distilled models to adversarial injection of biased content during training. We demonstrate that adversaries can inject subtle biases into teacher models through minimal data poisoning, which propagates to student models and becomes significantly amplified. We propose two propagation modes: Untargeted Propagation, where bias affects multiple tasks, and Targeted Propagation, focusing on specific tasks while maintaining normal behavior elsewhere. With only 25 poisoned samples (0.25% poisoning rate), student models generate biased responses 76.9% of the time in targeted scenarios - higher than 69.4% in teacher models. For untargeted propagation, adversarial bias appears 6x-29x more frequently in student models on unseen tasks. We validate findings across six bias types (targeted advertisements, phishing links, narrative manipulations, insecure coding practices), various distillation methods, and different modalities spanning text and code generation. Our evaluation reveals shortcomings in current defenses - perplexity filtering, bias detection systems, and LLM-based autorater frameworks - against these attacks. Results expose significant security vulnerabilities in distilled models, highlighting need for specialized safeguards. We propose practical design principles for building effective adversarial bias mitigation strategies.
Model-agnostic clean-label backdoor mitigation in cybersecurity environments
2025-10-06
articleSenior authorThe training phase of machine learning models is a delicate step, especially in cybersecurity contexts. Recent research has surfaced a series of insidious training-time attacks that inject backdoors in models designed for security classification tasks without altering the training labels. With this work, we propose new techniques that leverage insights in cybersecurity threat models to effectively mitigate these clean-label poisoning attacks, while preserving the model utility. By performing density-based clustering on a carefully chosen feature subspace, and progressively isolating the suspicious clusters through a novel iterative scoring procedure, our defensive mechanism can mitigate the attacks without requiring many of the common assumptions in the existing backdoor defense literature.
Text-to-Image Models Leave Identifiable Signatures: Implications for Leaderboard Security
ArXiv.org · 2025-10-07
preprintOpen accessGenerative AI leaderboards are central to evaluating model capabilities, but remain vulnerable to manipulation. Among key adversarial objectives is rank manipulation, where an attacker must first deanonymize the models behind displayed outputs -- a threat previously demonstrated and explored for large language models (LLMs). We show that this problem can be even more severe for text-to-image leaderboards, where deanonymization is markedly easier. Using over 150,000 generated images from 280 prompts and 19 diverse models spanning multiple organizations, architectures, and sizes, we demonstrate that simple real-time classification in CLIP embedding space identifies the generating model with high accuracy, even without prompt control or historical data. We further introduce a prompt-level separability metric and identify prompts that enable near-perfect deanonymization. Our results indicate that rank manipulation in text-to-image leaderboards is easier than previously recognized, underscoring the need for stronger defenses.
Riddle Me This! Stealthy Membership Inference for Retrieval-Augmented Generation
2025-11-19
articleOpen accessRetrieval-Augmented Generation (RAG) enables Large Language Models (LLMs) to generate grounded responses by leveraging external knowledge databases without altering model parameters. Although the absence of weight tuning prevents leakage via model parameters, it introduces the risk of inference adversaries exploiting retrieved documents in the model's context. Existing methods for membership inference and data extraction often rely on jailbreaking or carefully crafted unnatural queries, which can be easily detected or thwarted with query rewriting techniques common in RAG systems. In this work, we present øurattackfull (øurattack), a membership inference technique targeting documents in the RAG datastore. By crafting natural-text queries that are answerable only with the target document's presence, our approach demonstrates successful inference with just 30 queries while remaining stealthy; straightforward detectors identify adversarial prompts from existing methods up to ~76× more frequently than those generated by our attack. We observe a 2× improvement in TPR@1%FPR over prior inference attacks across diverse RAG configurations, all while costing less than $0.02 per document inference.
SleeperNets: Universal Backdoor Poisoning Attacks Against Reinforcement Learning Agents
arXiv (Cornell University) · 2024-05-30
preprintOpen accessSenior authorReinforcement learning (RL) is an actively growing field that is seeing increased usage in real-world, safety-critical applications -- making it paramount to ensure the robustness of RL algorithms against adversarial attacks. In this work we explore a particularly stealthy form of training-time attacks against RL -- backdoor poisoning. Here the adversary intercepts the training of an RL agent with the goal of reliably inducing a particular action when the agent observes a pre-determined trigger at inference time. We uncover theoretical limitations of prior work by proving their inability to generalize across domains and MDPs. Motivated by this, we formulate a novel poisoning attack framework which interlinks the adversary's objectives with those of finding an optimal policy -- guaranteeing attack success in the limit. Using insights from our theoretical analysis we develop ``SleeperNets'' as a universal backdoor attack which exploits a newly proposed threat model and leverages dynamic reward poisoning techniques. We evaluate our attack in 6 environments spanning multiple domains and demonstrate significant improvements in attack success over existing methods, while preserving benign episodic return.
2024-05-19 · 2 citations
articleDropout is a common operator in deep learning, aiming to prevent overfitting by randomly dropping neurons during training. This paper introduces a new family of poisoning attacks against neural networks named DROPOUTATTACK. DROPOUTATTACK attacks the dropout operator by manipulating the selection of neurons to drop instead of selecting them uniformly at random. We design, implement, and evaluate four DROPOUTATTACK variants that cover a broad range of scenarios. These attacks can slow or stop training, destroy prediction accuracy of target classes, and sabotage either precision or recall of a target class. In our experiments of training a VGG-16 model on CIFAR-100, our attack can reduce the precision of the victim class by 34.6% (81.7% → 47.1%) without incurring any degradation in model accuracy.
Recent grants
Frequent coauthors
- 25 shared
Matthew Jagielski
Google (United States)
- 20 shared
Battista Biggio
University of Cagliari
- 19 shared
Cristina Nita-Rotaru
Northeastern University
- 18 shared
Ari Juels
- 17 shared
Simona Boboila
Northeastern University
- 15 shared
Fabio Roli
University of Cagliari
- 14 shared
Giorgio Severi
- 13 shared
Kevin D. Bowers
Awards & honors
- Best Paper Award at the 2005 Network and Distributed System…
- Technology Review TR35 award (2011)
- Resume-aware match score
- Save to shortlist
- AI-drafted outreach
See your match with Alina Oprea
PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.
- Free to start
- No credit card
- 30-second signup