ASERS-LSTM: Arabic Speech Emotion Recognition System Based on LSTM Model

doi:10.5121/sipij.2022.13102

Volume 13, Number 1

ASERS-LSTM: Arabic Speech Emotion Recognition System Based on LSTM Model

Authors

Mohammed Tajalsir¹, Susana Mu˜noz Hern´andez² and Fatima Abdalbagi Mohammed¹, ¹Sudan University of Science and Technology, Sudan, ²Technical University of Madrid (UPM), Computer Science School (FI), Spain

Abstract

The swift progress in the study field of human-computer interaction (HCI) causes to increase in the interest in systems for Speech emotion recognition (SER). The speech Emotion Recognition System is the system that can identify the emotional states of human beings from their voice. There are well works in Speech Emotion Recognition for different language but few researches have implemented for Arabic SER systems and that because of the shortage of available Arabic speech emotion databases. The most commonly considered languages for SER is English and other European and Asian languages. Several machine learning-based classifiers that have been used by researchers to distinguish emotional classes: SVMs, RFs, and the KNN algorithm, hidden Markov models (HMMs), MLPs and deep learning. In this paper we propose ASERS-LSTM model for Arabic Speech Emotion Recognition based on LSTM model. We extracted five features from the speech: Mel-Frequency Cepstral Coefficients (MFCC) features, chromagram, Melscaled spectrogram, spectral contrast and tonal centroid features (tonnetz). We evaluated our model using Arabic speech dataset named Basic Arabic Expressive Speech corpus (BAES-DB). In addition of that we also construct a DNN for classify the Emotion and compare the accuracy between LSTM and DNN model. For DNN the accuracy is 93.34% and for LSTM is 96.81%.

Keywords

Emotion recognition, Deep learning, LSTM, DNN.