Academy & Industry Research Collaboration Center (AIRCC)

Volume 12, Number 19, November 2022

Roberta Goes for IPO: Prospectus Analysis with Language Models for Indian Initial Public Offerings

  Authors

Abhishek Mishra1 and Yogendra Sisodia2, 1Trust Group, India, 2Conga, India

  Abstract

With the advent of large-scale language models in natural language processing (NLP), extracting valuable information from financial documents has gained popularity among researchers, and deep learning has boosted the development of effective text mining models. Prospectus text mining is very important for the investor community to identify major risk factors and evaluate the usage of the amount to be raised during an IPO. In this paper, we investigate how the recently introduced pre-trained language model Roberta can be adapted for this task. We also introduced prospectus-specific sentence transformers for semantic textual similarity along with a dataset to verify the efficacy of our work.

  Keywords

IPO, Prospectus, Large Language Models, Semantic Textual Similarity.