ASERS-CNN: Arabic Speech Emotion Recognition System based on CNN Model

doi:10.5121/sipij.2022.13104

Volume 13, Number 1

ASERS-CNN: Arabic Speech Emotion Recognition System based on CNN Model

Authors

Mohammed Tajalsir¹, Susana Mu˜noz Hern´andez² and Fatima Abdalbagi Mohammed¹, ¹Sudan University of Science and Technology, Sudan, ²Technical University of Madrid (UPM), Computer Science School (FI), Spain

Abstract

When two people are on the phone, although they cannot observe the other person's facial expression and physiological state, it is possible to estimate the speaker's emotional state by voice roughly. In medical care, if the emotional state of a patient, especially a patient with an expression disorder, can be known, different care measures can be made according to the patient's mood to increase the amount of care. The system that capable for recognize the emotional states of human being from his speech is known as Speech emotion recognition system (SER). Deep learning is one of most technique that has been widely used in emotion recognition studies, in this paper we implement CNN model for Arabic speech emotion recognition. We propose ASERS-CNN model for Arabic Speech Emotion Recognition based on CNN model. We evaluated our model using Arabic speech dataset named Basic Arabic Expressive Speech corpus (BAES-DB). In addition of that we compare the accuracy between our previous ASERS-LSTM and new ASERS-CNN model proposed in this paper and we comes out that our new proposed mode is outperformed ASERS-LSTM model where it get 98.18% accuracy.

Keywords

BAES-DB, ASERS-LSTM, Deep learning, Speech emotion recognition.