Resume-aware faculty matching

Find professors who actually fit you

Upload your resume. Four AI agents analyze your background, rank the faculty who fit, inspect their recent research, and help you draft outreach — grounded in their actual work, not templates.

Free to startNo credit cardCancel anytime
Top matches Balanced preset
Dr. Sarah Chen
Stanford · Interpretability · NLP
91
Dr. Marcus Holloway
MIT · Robotics · RL
84
Dr. Aisha Okonkwo
CMU · Fairness · HCI
82
Nova · Professor Researcher · re-ranking top 20…

Hamed Zamani

· Associate Professor and CIIR Associate DirectorVerified

University of Massachusetts Amherst · International Relations

Active 2014–2026

h-index33
Citations4.2k
Papers226149 last 5y
Funding
See your match with Hamed Zamani — sign in to PhdFit.Sign in

About

Hamed Zamani is an Associate Professor at the Manning College of Information and Computer Sciences at the University of Massachusetts Amherst, where he also serves as the Associate Director of the Center for Intelligent Information Retrieval (CIIR). His research focuses on designing and evaluating statistical and machine learning models with applications to interactive information access systems, including search engines, recommender systems, and question answering. His current research interests include Neural Information Retrieval, Conversational Search, and Retrieval-Enhanced Machine Learning. Prior to his position at UMass, Zamani was a Researcher at Microsoft, working on a wide range of problems related to search engines. He received his Ph.D. in 2019 from UMass under the supervision of W. Bruce Croft, and was awarded the UMass CICS Outstanding Dissertation Award for his thesis on weakly supervised neural information retrieval. He holds M.Sc. and B.Sc. degrees from the University of Tehran. Zamani is actively involved in organizing workshops and conferences in the field, has received multiple awards including the ACM SIGIR Early Career Excellence in Research and Excellence in Community Engagement Awards, and has been recognized for his contributions to the research community.

Research topics

  • Computer Science
  • Machine Learning
  • Data Mining
  • Artificial Intelligence
  • Information Retrieval
  • World Wide Web
  • Data science

Selected publications

  • Evaluation of Agents under Simulated AI Marketplace Dynamics

    arXiv (Cornell University) · 2026-04-15

    preprintOpen access

    Modern information access ecosystems consist of mixtures of systems, such as retrieval systems and large language models, and increasingly rely on marketplaces to mediate access to models, tools, and data, making competition between systems inherent to deployment. In such settings, outcomes are shaped not only by benchmark quality but also by competitive pressure, including user switching, routing decisions, and operational constraints. Yet evaluation is still largely conducted on static benchmarks with accuracy-focused measures that assume systems operate in isolation. This mismatch makes it difficult to predict post-deployment success and obscures competitive effects such as early-adoption advantages and market dominance. We introduce Marketplace Evaluation, a simulation-based paradigm that evaluates information access systems as participants in a competitive marketplace. By simulating repeated interactions and evolving user and agent preferences, the framework enables longitudinal evaluation and marketplace-level metrics, such as retention and market share, that complement and can extend beyond traditional accuracy-based metrics. We formalize the framework and outline a research agenda, motivated by business and economics, around marketplace simulation, metrics, optimization, and adoption in evaluation campaigns like TREC.

  • Evaluation of Agents under Simulated AI Marketplace Dynamics

    ArXiv.org · 2026-04-15

    articleOpen access

    Modern information access ecosystems consist of mixtures of systems, such as retrieval systems and large language models, and increasingly rely on marketplaces to mediate access to models, tools, and data, making competition between systems inherent to deployment. In such settings, outcomes are shaped not only by benchmark quality but also by competitive pressure, including user switching, routing decisions, and operational constraints. Yet evaluation is still largely conducted on static benchmarks with accuracy-focused measures that assume systems operate in isolation. This mismatch makes it difficult to predict post-deployment success and obscures competitive effects such as early-adoption advantages and market dominance. We introduce Marketplace Evaluation, a simulation-based paradigm that evaluates information access systems as participants in a competitive marketplace. By simulating repeated interactions and evolving user and agent preferences, the framework enables longitudinal evaluation and marketplace-level metrics, such as retention and market share, that complement and can extend beyond traditional accuracy-based metrics. We formalize the framework and outline a research agenda, motivated by business and economics, around marketplace simulation, metrics, optimization, and adoption in evaluation campaigns like TREC.

  • Pathways of Thoughts: Multi-Directional Thinking for Long-form Personalized Question Answering

    2026-04-12

    articleOpen access

    Personalization is well studied in search and recommendation, but personalized question answering remains underexplored due to challenges in inferring preferences from long, noisy, implicit contexts and generating responses that are both accurate and aligned with user expectations. To address this, we propose Pathways of Thoughts (PoT), an inference-stage method that applies to any large language model (LLM) without task-specific fine-tuning. PoT models the thinking as an iterative decision process, where the model dynamically selects among cognitive operations such as reasoning, revision, personalization, and clarification. This enables exploration of multiple reasoning trajectories, producing diverse candidate responses that capture different perspectives. PoT then aggregates and reweights these candidates according to inferred user preferences, yielding a final personalized response that benefits from the complementary strengths of diverse reasoning paths. Experiments on the LaMP-QA benchmark show that PoT consistently outperforms competitive baselines, achieving up to a 10.8% relative improvement. Human evaluation further validates these improvements, with annotators preferring PoT in 66% of cases compared to the best-performing baseline and reporting ties in 15% of cases.

  • Distillation and Refinement of Reasoning in Small Language Models for Document Re-ranking

    2025-07-18

    preprintOpen accessSenior author

    We present a novel approach for training small language models for reasoning-intensive document ranking that combines knowledge distillation with reinforcement learning optimization. While existing methods often rely on expensive human annotations or large black-box language models, our methodology leverages web data and a teacher LLM to automatically generate high-quality training examples with relevance explanations. By framing document ranking as a reinforcement learning problem and incentivizing explicit reasoning capabilities, we train a compact 3B parameter language model that achieves state-of-the-art performance on the BRIGHT benchmark. Our model ranks third on the leaderboard while using substantially fewer parameters than other approaches, outperforming models that are over 20 times larger. Through extensive experiments, we demonstrate that generating explanations during inference, rather than directly predicting relevance scores, enables more effective reasoning with smaller language models. The self-supervised nature of our method offers a scalable and interpretable solution for modern information retrieval systems.

  • Hypencoder: Hypernetworks for Information Retrieval

    2025-07-13 · 3 citations

    articleOpen accessSenior author

    Existing information retrieval systems are largely constrained by their reliance on vector inner products to assess query-document relevance, which naturally limits the expressiveness of the relevance score they can produce.We propose a new paradigm; instead of representing a query as a vector, we use a small neural network that acts as a learned query-specific relevance function.This small neural network takes a document representation as input (in this work we use a single vector) and produces a scalar relevance score.To produce the small neural network we use a hypernetwork, a network that produces the weights of other networks, as our query encoder.We name this category of encoder models Hypencoders.Experiments on in-domain search tasks show that Hypencoders significantly outperform strong dense retrieval models and even surpass reranking models and retrieval models with an order of magnitude more parameters.To assess the extent of Hypencoders' capabilities, we evaluate on a set of hard retrieval tasks including tipof-the-tongue and instruction-following retrieval tasks.On harder tasks, we find that the performance gap widens substantially compared to standard retrieval tasks.Furthermore, to demonstrate the practicality of our method, we implement an approximate search algorithm and show that our model is able to retrieve from a corpus of 8.8M documents in under 60 milliseconds.

  • Pre-Trained Models for Search and Recommendation: Introduction to the Special Issue—Part 2

    ACM Transactions on Information Systems · 2025-05-27

    article
  • Beyond a Million Tokens: Benchmarking and Enhancing Long-Term Memory in LLMs

    ArXiv.org · 2025-10-31

    preprintOpen access

    Evaluating the abilities of large language models (LLMs) for tasks that require long-term memory and thus long-context reasoning, for example in conversational settings, is hampered by the existing benchmarks, which often lack narrative coherence, cover narrow domains, and only test simple recall-oriented tasks. This paper introduces a comprehensive solution to these challenges. First, we present a novel framework for automatically generating long (up to 10M tokens), coherent, and topically diverse conversations, accompanied by probing questions targeting a wide range of memory abilities. From this, we construct BEAM, a new benchmark comprising 100 conversations and 2,000 validated questions. Second, to enhance model performance, we propose LIGHT-a framework inspired by human cognition that equips LLMs with three complementary memory systems: a long-term episodic memory, a short-term working memory, and a scratchpad for accumulating salient facts. Our experiments on BEAM reveal that even LLMs with 1M token context windows (with and without retrieval-augmentation) struggle as dialogues lengthen. In contrast, LIGHT consistently improves performance across various models, achieving an average improvement of 3.5%-12.69% over the strongest baselines, depending on the backbone LLM. An ablation study further confirms the contribution of each memory component.

  • Learning to Rank for Multiple Retrieval-Augmented Models through Iterative Utility Maximization

    2025-07-18

    articleOpen accessSenior author

    This paper investigates the design of a unified search engine to serve multiple retrieval-augmented generation (RAG) agents, each with a distinct task, backbone large language model (LLM), and RAG strategy. We introduce an iterative approach where the search engine generates retrieval results for the RAG agents and gathers feedback on the quality of the retrieved documents during an offline phase. This feedback is then used to iteratively optimize the search engine using an expectation-maximization algorithm, with the goal of maximizing each agent's utility function. Additionally, we adapt this to an online setting, allowing the search engine to refine its behavior based on real-time individual agents feedback to better serve the results for each of them. Experiments on datasets from the Knowledge-Intensive Language Tasks (KILT) benchmark demonstrates that our approach significantly on average outperforms baselines across 18 RAG models. We demonstrate that our method effectively ''personalizes'' the retrieval for each RAG agent based on the collected feedback. Finally, we provide a comprehensive ablation study to explore various aspects of our method.

  • Reliable Annotations with Less Effort: Evaluating LLM-Human Collaboration in Search Clarifications

    2025-07-18

    preprintOpen accessSenior author

    Despite growing interest in using large language models (LLMs) to automate annotation, their effectiveness in complex, nuanced, and multi-dimensional labelling tasks remains relatively underexplored. This study focuses on annotation for the search clarification task, leveraging a high-quality, multi-dimensional dataset that includes five distinct fine-grained annotation subtasks. Although LLMs have shown impressive capabilities in general settings, our study reveals that even state-of-the-art models struggle to replicate human-level performance in subjective or fine-grained evaluation tasks. Through a systematic assessment, we demonstrate that LLM predictions are often inconsistent, poorly calibrated, and highly sensitive to prompt variations. To address these limitations, we propose a simple yet effective human-in-the-loop (HITL) workflow that uses confidence thresholds and inter-model disagreement to selectively involve human review. Our findings show that this lightweight intervention significantly improves annotation reliability while reducing human effort by up to 45%, offering a relatively scalable and cost-effective yet accurate path forward for deploying LLMs in real-world evaluation settings.

  • Open-Ended and Knowledge-Intensive Video Question Answering

    ArXiv.org · 2025-02-17

    preprintOpen accessSenior author

    Video question answering that requires external knowledge beyond the visual content remains a significant challenge in AI systems. While models can effectively answer questions based on direct visual observations, they often falter when faced with questions requiring broader contextual knowledge. To address this limitation, we investigate knowledge-intensive video question answering (KI-VideoQA) through the lens of multi-modal retrieval-augmented generation, with a particular focus on handling open-ended questions rather than just multiple-choice formats. Our comprehensive analysis examines various retrieval augmentation approaches using cutting-edge retrieval and vision language models, testing both zero-shot and fine-tuned configurations. We investigate several critical dimensions: the interplay between different information sources and modalities, strategies for integrating diverse multi-modal contexts, and the dynamics between query formulation and retrieval result utilization. Our findings reveal that while retrieval augmentation shows promise in improving model performance, its success is heavily dependent on the chosen modality and retrieval methodology. The study also highlights the critical role of query construction and retrieval depth optimization in effective knowledge integration. Through our proposed approach, we achieve a substantial 17.5% improvement in accuracy on multiple choice questions in the KnowIT VQA dataset, establishing new state-of-the-art performance levels.

Frequent coauthors

  • Nick Craswell

    Seattle University

    70 shared
  • W. Bruce Croft

    University of Massachusetts Amherst

    45 shared
  • Bhaskar Mitra

    44 shared
  • Sebastian Hofstätter

    31 shared
  • Azadeh Shakery

    University of Tehran

    27 shared
  • Andrew McCallum

    Queen Elizabeth University Hospital

    19 shared
  • Gord Lueck

    Microsoft (United States)

    19 shared
  • Alireza Salemi

    University of Massachusetts Amherst

    17 shared

Labs

Education

  • Ph.D.

    University of Massachusetts Amherst

    2019
  • M.S.

    University of Tehran

  • B.S.

    University of Tehran

Awards & honors

  • ACM SIGIR Early Career Excellence in Research (2023)
  • ACM SIGIR Excellence in Community Engagement Award (2023)
  • Best Short Paper Award at ACM SIGIR 2024
  • Best Student Paper Award at ACM SIGIR 2023
  • Best Short Paper Award at ACM SIGIR 2022
  • Resume-aware match score
  • Save to shortlist
  • AI-drafted outreach

See your match with Hamed Zamani

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

  • Free to start
  • No credit card
  • 30-second signup