Academy & Industry Research Collaboration Center (AIRCC)

Volume 11, Number 19, November 2021

Unsupervised Named Entity Recognition for Hi-Tech Domain

  Authors

Abinaya Govindan, Gyan Ranjan and Amit Verma, Neuron7.ai, USA

  Abstract

This paper presents named entity recognition as a multi-answer QA task combined with contextual natural-language-inference based noise reduction. This method allows us to use pre-trained models that have been trained for certain downstream tasks to generate unsupervised data, reducing the need for manual annotation to create named entity tags with tokens. For each entity, we provide a unique context, such as entity types, definitions, questions and a few empirical rules along with the target text to train a named entity model for the domain of our interest. This formulation (a) allows the system to jointly learn NER-specific features from the datasets provided, and (b) can extract multiple NER-specific features, thereby boosting the performance of existing NER models (c) provides business-contextualized definitions to reduce ambiguity among similar entities. We conducted numerous tests to determine the quality of the created data, and we find that this method of data generation allows us to obtain clean, noise-free data with minimal effort and time. This approach has been demonstrated to be successful in extracting named entities, which are then used in subsequent components.

  Keywords

natural language processing,named entity recognition unstructured data generation, question answering, information retrieval.