Enhancing Non-native Accent Recognition Through a Combination of Speaker Embeddings, Prosodic and Vocal
Speech Features

doi:10.5121/sipij.2023.14302

Volume 14, Number 2/3

Enhancing Non-native Accent Recognition Through a Combination of Speaker Embeddings, Prosodic and Vocal Speech Features

Authors

Anup Bera and Aanchal Agarwal, Accenture, India

Abstract

The transcription accuracy of automatic speech recognition (ASR) system may suffer when recognizing accented speech. The resulting bias in ASR system towards a specific accent due to under representation of that accent in the training dataset. Accent recognition of existing speech samples can help with the preparation of the training datasets, which is an important step toward closing the accent gap and eliminating biases in ASR system. For that we built a system to recognize accent from spoken speech data. In this study, we have explored some prosodic and vocal speech features as well as speaker embeddings for accent recognition on our custom English speech data that covers speakers from around the world with varying accents. We demonstrate that our selected speech features are more effective in recognizing nonnative accents. Additionally, we experimented with a hierarchical classification model for multi-level accent classification. To establish an accent hierarchy, we employed a bottom-up approach, combining regional accents and categorizing them as either native or non-native at the top level. Furthermore, we conducted a comparative study between flat classification and hierarchical classification using the accent hierarchy structure.

Keywords

Automatic speech recognition; accent recognition; Hierarchical classification; MLP Classifier; Speech Features; speaker embedding; native and non-native accents