Performance Analysis of Different Acoustic Features based on LSTM for Bangla Speech Recognition

Nahyan Al Mahmud; Nahyan Al Mahmud

doi:10.5121/ijma.2020.12402

Volume 12, Number 1/2/3/4

Performance Analysis of Different Acoustic Features based on LSTM for Bangla Speech Recognition

Authors

Nahyan Al Mahmud, Ahsanullah University of Science and Technology, Bangladesh

Abstract

In this work a new Bangla speech corpus along with proper transcriptions has been developed; also various acoustic feature extraction methods have been investigated using Long Short-Term Memory (LSTM) neural network to find their effective integration into a state-of-the-art Bangla speech recognition system. The acoustic features are usually a sequence of representative vectors that are extracted from speech signals and the classes are either words or sub word units such as phonemes. The most commonly used feature extraction method, known as linear predictive coding (LPC), has been used first in this work. Then the other two popular methods, namely, the Mel frequency cepstral coefficients (MFCC) and perceptual linear prediction (PLP) have also been applied. These methods are based on the models of the human auditory system. A detailed review of the implementation of these methods have been described first. Then the steps of the implementation have been elaborated for the development of an automatic speech recognition system (ASR) for Bangla speech.

Keywords

Mel frequency cepstral coefficients, linear predictive coding, perceptual linear prediction, sentence correct rates, LSTM.