×
Efficient Hybrid Prompt-Pruning for Open-Source Llm based Machine Translation

Authors

Zaowad R. Abdullah1,3, Manal Iftikhar1,4 and Md. Tariqul Islam2,3, Rifat Shahriyar3, 1Alexa Translations, Pakistan, 2Verbex AI,Pakistan, 3Bangladesh University of Engineering and Technology, Pakistan, 4Fast Nuces Lahore, Pakistan

Abstract

We propose a hybrid retrieval strategy for open-source LLM-based machine translation that filters out irrel- evant top-k candidates before constructing the final translation prompt, thereby reducing input token count while main- taining or improving translation quality. Throughout this work, we demonstrate that fixed top-k retrieval in translation- specific LLMs is suboptimal, often incorporating redundant or irrelevant examples into the translation prompt. Our method combines dense embedding model relevance scores and normalized sparse BM25 scores to yield a hybrid score which is later used to filter out irrelevant examples that fall below an empirically derived threshold. Unlike prior domain adaptation methods such as kNN-MT, LLM-based translation avoids dense token-level lookups. Rather, it incorporates source-translation pairs semantically/lexically similar to the translation query into the prompt and achieves a signifi- cant level of domain adaptation. While being simpler and significantly faster than kNN-MT, the quality of LLM-based MT depends highly on the context provided. Fixed retrieval configurations (e.g., top-5 or top-10), commonly adopted from general NLP tasks, often include irrelevant or redundant examples. While reranker models are usually employed to reorder retrieved examples, they still rely on a fixed top-k setup, leading to the inclusion of superfluous examples. Our experiments demonstrate a simple yet effective method that dynamically filters out suboptimal examples, retaining only the most relevant context for each translation query. Experiments across seven domains and three language pairs (DE→EN, AR→EN, ZH→EN) show that our method preserves translation performance while significantly reducing prompt size. We also compare our setup with the popular reranker model Cohere Rerank 3.5 to establish the credibility of our work. Furthermore, evaluations on the PeerQA benchmark demonstrate substantial gains in zero-shot segment- level retrieval, validating the hybrid pruning method. Our findings highlight the impact of selective example retrieval for optimally domain-adapted multilingual machine translation.

Keywords

Machine Translation, LLM, RAG(Retrieval Augmented Generation), Information Retrieval,n-shot Prompt- ing, Prompt-Pruning , Domain Adaptation