Volume 9, Number 1

On the Relevance of Query Expansion Using Parallel Corpora and Word Embeddings to Boost Text Document Retrieval Precision

  Authors

Alaidine Ben Ayed1 and Ismaïl Biskri2, 1Université du Québec à Montréal (UQAM), Canada and 2Université du Québec à Trois Rivières (UQTR), Canada

  Abstract

In this paper we implement a document retrieval system using the Lucene tool and we conduct some experiments in order to compare the efficiency of two different weighting schema: the well-known TF-IDF and the BM25. Then, we expand queries using a comparable corpus (wikipedia) and word embeddings. Obtained results show that the latter method (word embeddings) is a good way to achieve higher precision rates and retrieve more accurate documents.

  Keywords

Internet and Web Applications, Data and knowledge Representation, Document Retrieval.