Razieh (Negin) Rahimi

· Assistant ProfessorVerified

University of Massachusetts Amherst · International Relations

Active 2009–2026

h-index9

Citations231

Papers3722 last 5y

Funding—

Faculty page Lab page Website

See your match with Razieh (Negin) Rahimi — sign in to PhdFit.Sign in

About

Razieh (Negin) Rahimi is an Assistant Professor at the Manning College of Information and Computer Sciences at the University of Massachusetts Amherst. Her research group focuses on building agentic intelligent systems that enable general-purpose and personalized access to heterogeneous information through learning and interaction. She is also affiliated with the Center for Intelligent Information Retrieval. Her current research directions include long-horizon reasoning in agentic AI, self-evolving agents, and reasoning-aware retrieval for augmented large language models. Dr. Rahimi has received several honors and awards, including the NSF CAREER Award in 2024, the Google Research Scholar Award in 2022, the Translational Seed Award in 2020, and the Amazon Research Award in 2019.

Research topics

Artificial Intelligence
Information Retrieval
Computer Science
Natural Language Processing
Mathematics

Selected publications

PRIME: Planning and Retrieval-Integrated Memory for Enhanced Reasoning
Proceedings of the AAAI Conference on Artificial Intelligence · 2026-03-14
articleOpen access
Inspired by the dual-process theory of human cognition from Thinking, Fast and Slow, we introduce PRIME (Planning and Retrieval-Integrated Memory for Enhanced Reasoning), a multi-agent reasoning framework that dynamically integrates System 1 (fast, intuitive thinking) and System 2 (slow, deliberate thinking). PRIME first employs a Quick Thinking Agent to generate a rapid answer; if uncertainty is detected, it then triggers a structured System 2 reasoning pipeline composed of specialized agents for planning, hypothesis generation, retrieval, information integration, and decision-making. This multi-agent design mimics human cognitive processes faithfully and enhances both efficiency and accuracy. Experimental results with LLaMA 3 models demonstrate that PRIME enables open-source LLMs to perform competitively with state-of-the-art closed-source models like GPT-4 and GPT-4o on benchmarks requiring multi-hop and knowledge-grounded reasoning. This research establishes PRIME as a scalable solution for improving LLMs in domains requiring complex, knowledge-intensive reasoning.
Publisher DOI
RaDeR: Reasoning-aware Dense Retrieval Models
ArXiv.org · 2025-05-23
preprintOpen accessSenior author
We propose RaDeR, a set of reasoning-based dense retrieval models trained with data derived from mathematical problem solving using large language models (LLMs). Our method leverages retrieval-augmented reasoning trajectories of an LLM and self-reflective relevance evaluation, enabling the creation of both diverse and hard-negative samples for reasoning-intensive relevance. RaDeR retrievers, trained for mathematical reasoning, effectively generalize to diverse reasoning tasks in the BRIGHT and RAR-b benchmarks, consistently outperforming strong baselines in overall performance. Notably, RaDeR achieves significantly higher performance than baselines on the Math and Coding splits. In addition, RaDeR presents the first dense retriever that outperforms BM25 when queries are Chain-of-Thought reasoning steps, underscoring the critical role of reasoning-based retrieval to augment reasoning language models. Furthermore, RaDeR achieves comparable or superior performance while using only 2.5% of the training data used by the concurrent work REASONIR, highlighting the quality of our synthesized training data.
Publisher OA PDF DOI
RaDeR: Reasoning-aware Dense Retrieval Models
2025-01-01 · 1 citations
articleOpen accessSenior author
We propose RaDeR, a set of reasoning-based dense retrieval models trained with data derived from mathematical problem solving using large language models (LLMs).Our method leverages retrieval-augmented reasoning trajectories of an LLM and self-reflective relevance evaluation, enabling the creation of both diverse and hard-negative samples for reasoning-intensive relevance.RaDeR retrievers, trained for mathematical reasoning, effectively generalize to diverse reasoning tasks in the BRIGHT and RAR-b benchmarks, consistently outperforming strong baselines in overall performance.Notably, RaDeR achieves significantly higher performance than baselines on the Math and Coding splits.In addition, RaDeR presents the first dense retriever that outperforms BM25 when queries are Chain-of-Thought reasoning steps, underscoring the critical role of reasoning-based retrieval to augment reasoning language models.Furthermore, RaDeR achieves comparable or superior performance while using only 2.5% of the training data used by the concurrent work REASONIR, highlighting the quality of our synthesized training data.Our code, data, and retrieval models are publicly available.
Publisher OA PDF DOI
Discovering Biases in Information Retrieval Models Using Relevance Thesaurus as Global Explanation
2024-01-01
articleOpen access
Publisher OA PDF DOI
Discovering Biases in Information Retrieval Models Using Relevance Thesaurus as Global Explanation
arXiv (Cornell University) · 2024-10-04
preprintOpen access
Most efforts in interpreting neural relevance models have focused on local explanations, which explain the relevance of a document to a query but are not useful in predicting the model's behavior on unseen query-document pairs. We propose a novel method to globally explain neural relevance models by constructing a "relevance thesaurus" containing semantically relevant query and document term pairs. This thesaurus is used to augment lexical matching models such as BM25 to approximate the neural model's predictions. Our method involves training a neural relevance model to score the relevance of partial query and document segments, which is then used to identify relevant terms across the vocabulary space. We evaluate the obtained thesaurus explanation based on ranking effectiveness and fidelity to the target neural ranking model. Notably, our thesaurus reveals the existence of brand name bias in ranking models, demonstrating one advantage of our explanation method.
Publisher OA PDF DOI
PaRaDe: Passage Ranking using Demonstrations with LLMs
2023-01-01 · 10 citations
articleOpen access
Andrew Drozdov, Honglei Zhuang, Zhuyun Dai, Zhen Qin, Razieh Rahimi, Xuanhui Wang, Dana Alon, Mohit Iyyer, Andrew McCallum, Donald Metzler, Kai Hui. Findings of the Association for Computational Linguistics: EMNLP 2023. 2023.
Publisher OA PDF DOI
PaRaDe: Passage Ranking using Demonstrations with Large Language Models
arXiv (Cornell University) · 2023-10-22 · 2 citations
preprintOpen access
Recent studies show that large language models (LLMs) can be instructed to effectively perform zero-shot passage re-ranking, in which the results of a first stage retrieval method, such as BM25, are rated and reordered to improve relevance. In this work, we improve LLM-based re-ranking by algorithmically selecting few-shot demonstrations to include in the prompt. Our analysis investigates the conditions where demonstrations are most helpful, and shows that adding even one demonstration is significantly beneficial. We propose a novel demonstration selection strategy based on difficulty rather than the commonly used semantic similarity. Furthermore, we find that demonstrations helpful for ranking are also effective at question generation. We hope our work will spur more principled research into question generation and passage ranking.
Publisher OA PDF DOI
Rank-LIME: Local Model-Agnostic Feature Attribution for Learning to Rank
2023-08-09 · 17 citations
article
Understanding why a model makes certain predictions is crucial when adapting it for real world decision making. LIME is a popular model-agnostic feature attribution method for the tasks of classification and regression. However, the task of learning to rank in information retrieval is more complex in comparison with either classification or regression. In this work, we extend LIME to propose Rank-LIME, a model-agnostic, local, post-hoc linear feature attribution method for the task of learning to rank that generates explanations for ranked lists. We employ novel correlation-based perturbations, differentiable ranking loss functions and introduce new metrics to evaluate ranking based additive feature attribution models. We compare Rank-LIME with a variety of competing systems, with models trained on the MS MARCO datasets and observe that Rank-LIME outperforms existing explanation algorithms in terms of Model Fidelity and Explain-NDCG. With this we propose one of the first algorithms to generate additive feature attributions for explaining ranked lists.
Publisher DOI
Conditional Natural Language Inference
2023-01-01 · 1 citations
articleOpen access
To properly explain sentence pairs that provide contradictory (different) information for different conditions, we introduce the task of conditional natural language inference (Cond-NLI) and focus on automatically extracting contradictory aspects and their conditions from a sentence pair. Cond-NLI can help to provide a full spectrum of information, such as when there are multiple answers to a question each addressing a specific condition, or reviews with different opinions for different conditions. We show that widely-used feature-attribution explanation models are not suitable for finding conditions, especially when sentences are long and are written independently. We propose a simple yet effective model for the original NLI task that can successfully extract conditions while not requiring token-level annotations. Our model enhances the interpretability of the NLI task while maintaining comparable accuracy. To evaluate models for the Cond-NLI, we build and release a token-level annotated dataset BioClaim which contains potentially contradictory claims from the biomedical domain. Our experiments show that our proposed model outperforms the full cross-encoder and other baselines in extracting conditions. It also performs on-par with GPT-3 which has an order of magnitude more parameters and trained on a huge amount of data.
Publisher OA PDF DOI
Search Result Diversification Using Query Aspects as Bottlenecks
2023-10-21 · 4 citations
article
We address some of the limitations of coverage-based search result diversification models, which often consist of separate components and rely on external systems for query aspects. To overcome these challenges, we introduce an end-to-end learning framework called DUB. Our approach preserves the intrinsic interpretability of coverage-based methods while enhancing diversification performance. Drawing inspiration from the information bottleneck method, we propose an aspect extractor that generates query aspect embeddings optimized as information bottlenecks for the task of diversified document re-ranking. Experimental results demonstrate that DUB outperforms state-of-the-art diversification models.
Publisher DOI

Frequent coauthors

James Allan
University of Massachusetts Amherst
19 shared
Azadeh Shakery
University of Tehran
8 shared
Andrew McCallum
Queen Elizabeth University Hospital
7 shared
Youngwoo Kim
7 shared
Mohit Iyyer
7 shared
Hamed Zamani
7 shared
Andrew Drozdov
7 shared
Shufan Wang
Shanghai Huayi Group (China)
5 shared

Labs

Center for Intelligent Information RetrievalPI

Awards & honors

NSF CAREER Award (2024)
Google Research Scholar Award (2022)
Translational Seed Award (2020)
Amazon Research Award (2019)

Resume-aware match score
Save to shortlist
AI-drafted outreach

See your match with Razieh (Negin) Rahimi

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

Join the waitlist How it works

Free to start
No credit card
30-second signup

Find professors who actually fit you