Academy & Industry Research Collaboration Center (AIRCC)

Volume 10, Number 09, July 2020

An Enhanced Lucene based System for Efficient Document/Information Retrieval

  Authors

Alaidine Ben Ayed1, Ismaïl Biskri1,2 and Jean-Guy Meunier1, 1Université du Québec à Montréal (UQAM), Canada, 2Université du Québec à Trois-Rivières (UQTR), Canada

  Abstract

In this paper we implement a document retrieval system using the Lucene tool and we conduct some experiments in order to compare the efficiency of two different weighting schema: the well-known TF-IDF and the BM25. Then, we expand queries using a comparable corpus (wikipedia) and word embeddings. Obtained results show that the latter method (word embeddings) is a good way to achieve higher precision rates and retrieve more accurate documents.

  Keywords

Internet and Web Applications, Data and knowledge Representation, Document Retrieval.