Hate Speech Detection of Arabic Shorttext

Abdullah Aref; Rana Husni Al Mahmoud; Khaled Taha; Mahmoud Al-Sharif

doi:10.5121/csit.2020.100507

Volume 10, Number 05, May 2020

Hate Speech Detection of Arabic Shorttext

Authors

Abdullah Aref¹, Rana Husni Al Mahmoud², Khaled Taha³ and Mahmoud Al-Sharif³, ¹Princess Sumaya University for Technology, Jordan, ²University of Jordan, Jordan and ³Trafalgar AI, Jordan

Abstract

The aim of sentiment analysis is to automatically extract the opinions from a certain text and decide its sentiment. In this paper, we introduce the first publicly-available Twitter dataset on Sunnah and Shia (SSTD), as part of a religious hate speech which is a sub problem of the general hate speech. We, further, provide a detailed review of the data collection process and our annotation guidelines such that a reliable dataset annotation is guaranteed. We employed many stand-alone classification algorithms on the Twitter hate speech dataset, including Random Forest, Complement NB, DecisionTree, and SVM and two deep learning methods CNN and RNN. We further study the influence of word embedding dimensions FastText and word2vec. In all our experiments, all classification algorithms are trained using a random split of data (66% for training and 34% for testing). The two datasets were stratified sampling of the original dataset. The CNN-FastText achieves the highest F-Measure (52.0%) followed by the CNN-Word2vec (49.0%), showing that neural models with FastText word embedding outperform classical feature-based models.

Keywords

HateSpeech, Dataset, Text classification, Sentiment analysis.

Subscription Membership AIRCC CSCP Contact Us
All Rights Reserved ® AIRCC

Volume 10, Number 05, May 2020

Hate Speech Detection of Arabic Shorttext

Authors

Abstract

Keywords

Conference Proceedings