Arlei Lopes da Silva

· Assistant Professor of Computer Science Member, Ken Kennedy InstituteVerified

Rice University · Computer Science

Active 2008–2025

h-index10

Citations656

Papers5925 last 5y

Funding—

Faculty page Lab page

See your match with Arlei Lopes da Silva — sign in to PhdFit.Sign in

About

Arlei Lopes da Silva is an Assistant Professor of Computer Science at Rice University, holding a courtesy appointment in the Department of Electrical and Computer Engineering. He is also a member of the Ken Kennedy Institute for Responsible AI and Computing for Global Impact. His research focuses on developing algorithms and models for mining and learning from complex datasets, broadly defined as data science, with a particular emphasis on data represented as graphs and networks. His interests are motivated by problems in computational social science, infrastructure, and healthcare. To address these challenges, he applies tools from machine learning, network science, graph theory, linear algebra, optimization, and statistics. Professor Silva earned his Ph.D. in Computer Science from the University of California, Santa Barbara, where he was advised by Ambuj Singh and also served as a postdoctoral scholar. Prior to that, he obtained his B.Sc. and M.Sc. degrees in Computer Science from Universidade Federal de Minas Gerais in Brazil, under the supervision of Wagner Meira Jr. He has also been a visiting scholar at Rensselaer Polytechnic Institute, hosted by Mohammed J. Zaki.

Research topics

Artificial Intelligence
Computer Science
Theoretical computer science
Mathematics
Mathematical optimization
Combinatorics
Algorithm

Selected publications

TrafficPulse: A Road-Sensor Assisted Traffic Tweet Misinformation Detection System
2025-05-15
articleSenior author
Traffic incident detection is a well-established task in transportation, traditionally addressed using a combination of traffic sensors and driver reports. More recently, social media has become a rich data source for timely incident detection. However, the highly dynamic nature of social media, the challenges in mapping textual content to precise real-world locations, and potentially misleading posts complicate the extraction of reliable traffic information. Motivated by these challenges and leveraging recent advances in large language models (LLMs), we propose a real-time tweet validation pipeline that extracts and verifies traffic incidents reported on X (former Twitter). Our approach employs advanced parsing techniques for localization extraction. It integrates publicly available data to confirm the existence of an incident, thereby enhancing the robustness of downstream traffic analysis methods that combine sensor data with verified textual features. To support further research in this domain, we also introduce two new datasets: the Twitter Traffic Incidents dataset, which comprises manually curated and human-verified incident reports, and the PeMS Sensor + Incidents Reports dataset, featuring snapshots from California's PeMS traffic sensor system. Experimental results demonstrate that our pipeline significantly improves the reliability of traffic incident validation in tweets, serving as a basis for future traffic anomaly detection research.
Publisher DOI
Attribute-Enhanced Similarity Ranking for Sparse Link Prediction
2025-04-04 · 1 citations
articleSenior author
Link prediction is a fundamental problem in graph data. In its most realistic setting, the problem consists of predicting missing or future links between random pairs of nodes from the set of disconnected pairs. Graph Neural Networks (GNNs) have become the predominant framework for link prediction. GNN-based methods treat link prediction as a binary classification problem and handle the extreme class imbalance---real graphs are very sparse---by sampling (uniformly at random) a balanced number of disconnected pairs not only for training but also for evaluation. However, we show that the reported performance of GNNs for link prediction in the balanced setting does not translate to the more realistic imbalanced setting and that simpler topology-based approaches are often better at handling sparsity. These findings motivate Gelato, a similarity-based link-prediction method that applies (1) graph learning based on node attributes to enhance a topological heuristic, (2) a ranking loss for addressing class imbalance, and (3) a negative sampling scheme that efficiently selects hard training pairs via graph partitioning. Experiments show that Gelato outperforms existing GNN-based alternatives.
Publisher DOI
Cross-Domain Graph Anomaly Detection via Test-Time Training with Homophily-Guided Self-Supervision
ArXiv.org · 2025-02-20
preprintOpen accessSenior author
Graph Anomaly Detection (GAD) has demonstrated great effectiveness in identifying unusual patterns within graph-structured data. However, while labeled anomalies are often scarce in emerging applications, existing supervised GAD approaches are either ineffective or not applicable when moved across graph domains due to distribution shifts and heterogeneous feature spaces. To address these challenges, we present GADT3, a novel test-time training framework for cross-domain GAD. GADT3 combines supervised and self-supervised learning during training while adapting to a new domain during test time using only self-supervised learning by leveraging a homophily-based affinity score that captures domain-invariant properties of anomalies. Our framework introduces four key innovations to cross-domain GAD: an effective self-supervision scheme, an attention-based mechanism that dynamically learns edge importance weights during message passing, domain-specific encoders for handling heterogeneous features, and class-aware regularization to address imbalance. Experiments across multiple cross-domain settings demonstrate that GADT3 significantly outperforms existing approaches, achieving average improvements of over 8.2\% in AUROC and AUPRC compared to the best competing model.
Publisher OA PDF DOI
Attribute-Enhanced Similarity Ranking for Sparse Link Prediction
arXiv (Cornell University) · 2024-11-29
preprintOpen accessSenior author
Link prediction is a fundamental problem in graph data. In its most realistic setting, the problem consists of predicting missing or future links between random pairs of nodes from the set of disconnected pairs. Graph Neural Networks (GNNs) have become the predominant framework for link prediction. GNN-based methods treat link prediction as a binary classification problem and handle the extreme class imbalance -- real graphs are very sparse -- by sampling (uniformly at random) a balanced number of disconnected pairs not only for training but also for evaluation. However, we show that the reported performance of GNNs for link prediction in the balanced setting does not translate to the more realistic imbalanced setting and that simpler topology-based approaches are often better at handling sparsity. These findings motivate Gelato, a similarity-based link-prediction method that applies (1) graph learning based on node attributes to enhance a topological heuristic, (2) a ranking loss for addressing class imbalance, and (3) a negative sampling scheme that efficiently selects hard training pairs via graph partitioning. Experiments show that Gelato outperforms existing GNN-based alternatives.
Publisher OA PDF DOI
FloodGNN-GRU: a spatio-temporal graph neural network for flood prediction
Environmental Data Science · 2024-01-01 · 29 citations
articleOpen accessSenior authorCorresponding
Abstract Classical approaches for flood prediction apply numerical methods for the solution of partial differential equations that capture the physics of inundation processes (e.g., the 2D Shallow Water equations). However, traditional inundation models are still unable to satisfy the requirements of many relevant applications, including early-warning systems, high-resolution (or large spatial domain) simulations, and robust inference over distributions of inputs (e.g., rainfall events). Machine learning (ML) approaches are a promising alternative to physics-based models due to their ability to efficiently capture correlations between relevant inputs and outputs in a data-driven fashion. In particular, once trained, ML models can be tested/deployed much more efficiently than classical approaches. Yet, few ML-based solutions for spatio-temporal flood prediction have been developed, and their reliability/accuracy is poorly understood. In this paper, we propose FloodGNN-GRU, a spatio-temporal flood prediction model that combines a graph neural network (GNN) and a gated recurrent unit (GRU) architecture. Compared to existing approaches, FloodGNN-GRU (i) employs a graph-based model (GNN); (ii) operates on both spatial and temporal dimensions; and (iii) processes the water flow velocities as vector features, instead of scalar features. We evaluate FloodGNN-GRU using a LISFLOOD-FP simulation of Hurricane Harvey (2017) in Houston, Texas. Our results, based on several metrics, show that FloodGNN-GRU outperforms several data-driven alternatives in terms of accuracy. Moreover, our approach can be trained 100x faster and tested 1000x faster than the time required to run a comparable simulation. These findings illustrate the potential of ML-based methods to efficiently emulate physics-based inundation models, especially for short-term predictions.
Publisher OA PDF DOI
Rearchitecting Datacenter Networks: A New Paradigm with Optical Core and Optical Edge
2024-05-20 · 3 citations
article
All-optical circuit-switching (OCS) technology is the key to design energy-efficient and high-performance datacenter network (DCN) architectures for the future. However, existing round-robin based OCS cores perform poorly under realistic workloads having high traffic skewness and high volume of inter-rack traffic. To address this issue, we propose a novel DCN architecture OSSV: a combination of OCS-based core (between ToR switches) and OCS-based reconfigurable edge (between servers and ToR switches). On one hand, the OCS core is traffic agnostic and realizes reconfigurably non-blocking ToR-level connectivity. On the other hand, OCS-based edge reconfigures itself to reshape the incoming traffic in order to jointly minimize traffic skewness and inter-rack traffic volume. Our novel optimization framework can obtain the right balance between these intertwined objectives. Our extensive simulations and testbed evaluation show that OSSV can achieve high performance under diverse DCN traffic while consuming low power and incurring low cost.
Publisher DOI
Feature-based Individual Fairness in k-clustering
2023-05-30 · 1 citations
preprintOpen access
Ensuring fairness in machine learning algorithms is a challenging and essential task. We consider the problem of clustering a set of points while satisfying fairness constraints. While there have been several attempts to capture group fairness in the k-clustering problem, fairness at an individual level is not so well-studied. We introduce a new notion of individual fairness in k-clustering based on features not necessarily used for clustering. The problem is NP-hard and does not admit a constant factor approximation. Therefore, we design a randomized heuristic algorithm. Our experimental results against six competing baselines validate that our algorithm produces individually fairer clusters than the fairest baseline.
Publisher OA PDF DOI
USDOT Tier-1 University Transportation Center for Advancing Cybersecurity Research and Education
2023-05-01
article
The Transportation Cybersecurity Center for Advanced Research and Education (CYBER-CARE) is a US Department of Transportation (USDOT) Tier-1 University Transportation Center (UTC) funded in 2023. CYBER-CARE primarily focuses on the USDOT statutory research priority area of “Reducing Transportation Cybersecurity Risks.” CYBER-CARE aims to establish a fundamental knowledge basis and explore advanced theory to mitigate the impacts of large-scale cyberattacks on transportation infrastructure and connected and automated vehicle (CAV) systems. The research projects at CYBER-CARE will develop conceptual frameworks, construct comprehensive datasets, explore novel analytical approaches, support the implementation of public policies and infrastructure investments, and build a high-quality industry workforce through education. All CYBER-CARE research projects can be organized into four thrusts: CAV cybersecurity, transportation data security, advanced traffic management system (ATMS) cybersecurity, and next-generation transportation cybersecurity systems. In addition, CYBER-CARE will accelerate industry collaborations, foster new technologies, and provide professionals with the skills and opportunities needed to become successful leaders in their fields. Notably, as CYBER-CARE will prioritize engagement with underrepresented minorities, these communities stand to benefit from professional development training in transportation cybersecurity.
Publisher DOI
Sensor Placement for Learning in Flow Networks
arXiv (Cornell University) · 2023-12-12
preprintOpen accessSenior author
Large infrastructure networks (e.g. for transportation and power distribution) require constant monitoring for failures, congestion, and other adversarial events. However, assigning a sensor to every link in the network is often infeasible due to placement and maintenance costs. Instead, sensors can be placed only on a few key links, and machine learning algorithms can be leveraged for the inference of missing measurements (e.g. traffic counts, power flows) across the network. This paper investigates the sensor placement problem for networks. We first formalize the problem under a flow conservation assumption and show that it is NP-hard to place a fixed set of sensors optimally. Next, we propose an efficient and adaptive greedy heuristic for sensor placement that scales to large networks. Our experiments, using datasets from real-world application domains, show that the proposed approach enables more accurate inference than existing alternatives from the literature. We demonstrate that considering even imperfect or incomplete ground-truth estimates can vastly improve the prediction error, especially when a small number of sensors is available.
Publisher OA PDF DOI
Link Prediction without Graph Neural Networks
arXiv (Cornell University) · 2023-05-23 · 3 citations
preprintOpen access
Link prediction, which consists of predicting edges based on graph features, is a fundamental task in many graph applications. As for several related problems, Graph Neural Networks (GNNs), which are based on an attribute-centric message-passing paradigm, have become the predominant framework for link prediction. GNNs have consistently outperformed traditional topology-based heuristics, but what contributes to their performance? Are there simpler approaches that achieve comparable or better results? To answer these questions, we first identify important limitations in how GNN-based link prediction methods handle the intrinsic class imbalance of the problem -- due to the graph sparsity -- in their training and evaluation. Moreover, we propose Gelato, a novel topology-centric framework that applies a topological heuristic to a graph enhanced by attribute information via graph learning. Our model is trained end-to-end with an N-pair loss on an unbiased training set to address class imbalance. Experiments show that Gelato is 145% more accurate, trains 11 times faster, infers 6,000 times faster, and has less than half of the trainable parameters compared to state-of-the-art GNNs for link prediction.
Publisher OA PDF DOI

Frequent coauthors

Ambuj K. Singh
University of California, Santa Barbara
30 shared
Wagner Meira
15 shared
Sourav Medya
University of Illinois Chicago
15 shared
Prithwish Basu
8 shared
Ananthram Swami
7 shared
Mohammed J. Zaki
Rensselaer Polytechnic Institute
7 shared
Sara Guimarães
Universidade Federal do Rio Grande do Norte
6 shared
Xuan-Hong Dang
6 shared

Labs

Arlei Lopes da Silva's LabPI
Developing algorithms and models for mining and learning from complex datasets, especially for data represented as graphs/networks.

Awards & honors

SNAKDD Best Paper Runner-up (2013)
Best M.Sc Thesis, Brazilian Computer Society (2011)
Best Undergraduate Research (top 6), Brazilian Computer Soci…

Resume-aware match score
Save to shortlist
AI-drafted outreach

See your match with Arlei Lopes da Silva

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

Join the waitlist How it works

Free to start
No credit card
30-second signup

Find professors who actually fit you