Volume 14, Number 3
Combining Machine Learning and Semantic Analysis for Efficient Misinformation Detection
of Arabic Covid-19 Tweets
Authors
Abdulrahim Alhaizaey and Jawad Berri, King Saud University, Saudi Arabia
Abstract
With the spread of social media platforms and the proliferation of misleading news, misinformation detection within microblogging platforms has become a real challenge. During the Covid-19 pandemic, many fake news and rumors were broadcasted and shared daily on social media. In order to filter out these fake news, many works have been done on misinformation detection using machine learning and sentiment analysis in the English language. However, misinformation detection research in the Arabic language on social media is limited. This paper introduces a misinformation verification system for Arabic COVID-19 related news using an Arabic rumors dataset on Twitter. We explored the dataset and prepared it using multiple phases of preprocessing techniques before applying different machine learning classification algorithms combined with a semantic analysis method. The model was applied on 3.6k annotated tweets achieving 93% best overall accuracy of the model in detecting misinformation. We further build another dataset of Covid-19 related claims in Arabic to examine how our model performs with this new set of claims. Results show that the combination of machine learning techniques and linguistic analysis achieves the best scores reaching 92% best accuracy in detecting the veracity of sentences of the new dataset.
Keywords
Misinformation, machine learning, Arabic NLP, contextual exploration, rumor detection.