Fortifying Small Language Models Against Query Injection Attacks

Presenter: Riddhimaan Senapati

Faculty Sponsor: James Allan

School: UMass Amherst

Research Area: Computer Science

Session: Poster Session 5, 3:15 PM - 4:00 PM, 163, C20

ABSTRACT

Relevance scores, a numerical representation of how useful a document is to a query, are used to calculate important information retrieval metrics such as precision, Normalized Discounted Cumulative gain etc. This process can be automated by Large Language Models (LLMs), but recent research has shown that LLMs are vulnerable to query injection where words from the query are injected into an irrelevant document leading to the LLM classifying the document as relevant. This thesis will focus on how to guard against these types of query injections for small language models (models with less than 10 billion parameters) by looking at prompt engineering techniques where we refine the prompt sent to the LLM, by running experiments with various type of attacks and mitigation strategies in order to find which techniques can help to make models perform better against query injection attacks. Results indicate that mitigation effectiveness varies significantly across different LLMs For gemma3:1b, few-shot prompting consistently performs best against all query injection attacks. For qwen3:0.6b, user prompt hardening generally achieves the best performance on all documents, though system prompt hardening performs better on relevant documents. These findings demonstrate that defending against query injection requires model-specific and sometimes even document-specific mitigation strategies.

RELATED ABSTRACTS