Presenter: Zacky Albandar
Group Members: Sevyn Gitau, Prateek Basavaraj
Faculty Sponsor: Marco Serafini
School: UMass Amherst
Research Area: Computer Science
Session: Poster Session 4, 2:15 PM - 3:00 PM, 165, D16
ABSTRACT
Large-scale AI systems rely on Approximate Nearest Neighbor (ANN) search methods to efficiently handle similarity queries. The Hierarchical Navigable Small World (HNSW) algorithm is currently the state-of-the-art approach for ANN search. While HNSW achieves highly efficient average-case performance, it exhibits suboptimal worst-case search complexity due to the long-tail distribution problem, in which a small percentage of queries account for a disproportionately large number of graph traversals or hops. This long-tail effect introduces significant delays in real-time applications. Despite this, existing research has yet to investigate HNSW’s intrinsic topological properties to determine the causes for the long-tail distribution phenomenon. To address this gap, our research provides a systematic analysis of the root causes of the tail distribution by evaluating the influence of key construction parameters on query performance. We conducted rigorous empirical investigations into the following three properties: data dimensionality, graph degree, and efConstruction, utilizing both synthetic data and standard high-dimensional benchmarks. Through our experiments, we hypothesize that by identifying the best parameters for graph degree, dimensionality, and efConstruction, we can shorten the long tail distribution, reducing the number of worst-case searches which take significantly more traversals to reach the top k nearest neighbors. Based on our experimental investigations, we expect to show that lower-dimensional embeddings and larger efConstruction settings will be the most effective in reducing the "curse of dimensionality”. Our findings will provide guidelines for optimal HNSW parameters, thereby facilitating the development of more cost-effective and efficient HNSW graphs for vector search applications.
RELATED ABSTRACTS