Question Answering Module Leveraging Heterogeneous Datasets

Abinaya Govindan; Gyan Ranjan; Amit Verma; Abinaya Govindan; Gyan Ranjan; Amit Verma

doi:10.5121/ijnlc.2021.10601

Volume 10, Number 6

Question Answering Module Leveraging Heterogeneous Datasets

Authors

Abinaya Govindan, Gyan Ranjan and Amit Verma, Neuron7.ai, USA

Abstract

Question Answering has been a well-researched NLP area over recent years. It has become necessary for users to be able to query through the variety of information available - be it structured or unstructured. In this paper, we propose a Question Answering module which a) can consume a variety of data formats - a heterogeneous data pipeline, which ingests data from product manuals, technical data forums, internal discussion forums, groups, etc. b) addresses practical challenges faced in real-life situations by pointing to the exact segment of the manual or chat threads which can solve a user query c) provides segments of texts when deemed relevant, based on user query and business context. Our solution provides a comprehensive and detailed pipeline that is composed of elaborate data ingestion, data parsing, indexing, and querying modules. Our solution is capable of handling a plethora of data sources such as text, images, tables, community forums, and flow charts. Our studies performed on a variety of business-specific datasets represent the necessity of custom pipelines like the proposed one to solve several real-world document question-answering.

Keywords

Machine comprehension, document parser, question answering, information retrieval, heterogeneous data.