Volume 8, Number 4

Improving Search Engines by Demoting Non-Relevant Documents

  Authors

Fadi Yamout and Mireille Makary, The International University of Beirut, Lebanon

  Abstract

A good search engine aims to have more relevant documents on the top of the list. This paper describes a new technique called “Improving search engines by demoting non-relevant documents” (DNR) that improves the precision by detecting and demoting non-relevant documents. DNR generates a new set of queries that are composed of the terms of the original query combined in different ways. The documents retrieved from those new queries are evaluated using a heuristic algorithm to detect the non-relevant ones. These non-relevant documents are moved down the list which will consequently improve the precision. The new technique is tested on WT2g test collection. The testing of the new technique is done using variant retrieval models, which are the vector model based on the TFIDF weighing measure, the probabilistic models based on the BM25, and DFR-BM25 weighing measures. The recall and precision ratios are used to compare the performance of the new technique against the performance of the original query.

  Keywords

Information retrieval, TFIDF, BM25, DFR-BM25