Volume 9, Number 1
On the Relevance of Query Expansion Using Parallel Corpora and Word Embeddings to Boost Text Document Retrieval Precision
Alaidine Ben Ayed1 and Ismaïl Biskri2, 1Université du Québec à Montréal (UQAM), Canada and 2Université du Québec à Trois Rivières (UQTR), Canada
In this paper we implement a document retrieval system using the Lucene tool and we conduct some experiments in order to compare the efficiency of two different weighting schema: the well-known TF-IDF and the BM25. Then, we expand queries using a comparable corpus (wikipedia) and word embeddings. Obtained results show that the latter method (word embeddings) is a good way to achieve higher precision rates and retrieve more accurate documents.
Internet and Web Applications, Data and knowledge Representation, Document Retrieval.