Volume 9, Number 5
Purva Singh, VIT University, India
On 11th March 2020, the World Health Organization (WHO) declared Corona Virus Disease of 2019 (COVID-19) as a pandemic. Over time, the exponential growth of this disease has highlighted a mixture of sentiments expressed by the general population from various parts of the world speaking varied languages. It is, therefore, essential to analyze the public sentiment during this wave of the pandemic. While much work prevails to determine the sentiment polarity for tweets related to COVID-19, expressed in the English language, we still need to work on public sentiments expressed in languages other than English. This paper proposes a framework, Covhindia, a deep-learning framework that performs sentiment polarity detection of tweets related to COVID-19 posted in the Hindi language on the Twitter platform. The proposed framework leverages machine translation on Hindi tweets and passes the translated data as input to a deep learning model which is trained on an English corpus of COVID-19 tweets posted from India . The paper compares nine deep learning models' performances in classifying the sentiment polarity on an English dataset. Performance comparison of these architectures reveals that the BERT model had the best polarity detection accuracy on the English corpus. As part of testing the Covhindia’s accuracy in performing sentiment classification on Hindi tweets, the paper employs a separate dataset developed using a python library called Tweepy to extract Hindi tweets related to COVID-19. Experimental results reveal that Covhindia achieved state-of-the-art accuracy in classifying COVID-19 tweets posted in the Hindi language. The use of open-source machine translation tools paved the way for leveraging Covhindia for performing multilingual sentiment classification on COVID-19 tweets. For the benefit of the research community, the code and Jupyter Notebooks related to this paper are available on Github
Natural language Processing, multilingual sentiment polarity detection, LSTM, BERT, COVID-19, pretrained word embeddings.