Volume 8, Number 5

Bengali Information Retrieval System(BIRS)

  Authors

Md. Kowsher1, Imran Hossen1 and SkShohorab Ahmed2, 1Noakhali Science and Technology University, Bangladesh and 2University of Rajshai, Bangladesh

  Abstract

Information Retrieval System is an effective process that helps a user to trace related information by Natural Language Processing (NLP). In this research paper, we present an algorithmic Information Retrieval System(BIRS) based on information and the system is significant mathematically and statistically. This paper is demonstrated by two algorithms for finding out the lemmatization of Bengali words such as Trie and Dictionary Based Search by Removing Affix (DBSRA) as well as compared with Edit Distance for the exact lemmatization. We present the Bengali Anaphora resolution system using the Hobbs’ algorithm to get the correct expression of information. As the actions of questions answering algorithms, the TF-IDF and Cosine Similarity are developed to find out the accurate answer from the documents. In this study, we introduce a Bengali Language Toolkit (BLTK) and Bengali Language Expression (BRE) that make the easiest implication of our task. We have also developed Bengali root word’s corpus, synonym word’s corpus, stop word’s corpus and gathered 672 articles form the popular Bengali newspapers ‘The Daily ProthomAlo’ is our inserted information. For testing this system, we have created 19335 questions from the introduced information and got 97.22% accurate answer.

  Keywords

Bangla language Processing, Information retrieval, Corpus, Mathematics, and Statistics.