Presenter: Ata Cinar Genc
Group Members: Emir Kaan Korukluoglu
Faculty Sponsor: James Allan
School: UMass Amherst
Research Area: Computer Science
ABSTRACT
Modern Information Retrieval (IR) systems typically use a "retrieve-then-rerank'' pipeline, where a computationally expensive, pre-determined cross-encoder re-ranks the top results from a fast initial retriever. While effective, this approach often applies heavy re-ranking models regardless of query complexity, resulting in high latency and wasted computational resources on simple queries. On our test set, only 40% of queries benefited from an accuracy gain with a re-ranker, and only 11% benefited from having a heavier re-ranker compared to a lighter one. We propose Adaptive Re-Ranking, a framework that dynamically routes queries to the most cost-effective strategy—ranging from sparse retrieval (BM25) and dense re-ranking (MiniLM-L6-v2) to heavy neural re-ranking (BGE-v2-m3)—based on query complexity. To train our routing classifier, we introduce a novel utility function to label queries based on their retrieval effectiveness (by combining nDCG@10 and MRR@10), penalized by latency. We curate a training dataset of over 300,000 labeled queries from diverse BEIR benchmarks. Compared to BGE, our method achieves around 1.15-53x lower median latency and around 1.11-3.52x lower mean latency across all datasets we have tested while delivering -19% to +5% effectiveness. Oracle results show that adaptive re-ranking can perform better than any fixed strategy. Our findings show that routing queries using our novel utility function offers a scalable solution to reduce computational costs and latency across a variety of IR systems.RELATED ABSTRACTS