Multi-task Knowledge Distillation with Rhythm Features for Speaker Verification

doi:10.5121/csit.2020.100523

Volume 10, Number 05, May 2020

Multi-task Knowledge Distillation with Rhythm Features for Speaker Verification

Authors

Ruyun Li¹, Peng Ouyang², Dandan Song² and Shaojun Wei¹, ¹Tsinghua University, China and ²TsingMicro Co. Ltd., China

Abstract

Recently, speaker embedding extracted by deep neural networks (DNN) has performed well in speaker verification (SV). However, it is sensitive to different scenarios, and it is too computationally intensive to be deployed on portable devices. In this paper, we first combine rhythm and MFCC features to improve the robustness of speaker verification. The rhythm feature can reflect the distribution of phonemes and help reduce the average error rate (EER) in speaker verification, especially in intra-speaker verification. In addition, we propose a multitask knowledge distillation architecture that transfers the embedding-level and label-level knowledge of a well-trained large teacher to a highly compact student network. The results show that rhythm features and multi-task knowledge distillation significantly improve the performance of the student network. In the ultra-short duration scenario, using only 14.9% of the parameters in the teacher network, the student network can even achieve a relative EER reduction of 32%.

Keywords

Multi-task learning, Knowledge distillation, Rhythm variation, Angular softmax, Speaker verification.

Subscription Membership AIRCC CSCP Contact Us
All Rights Reserved ® AIRCC

Volume 10, Number 05, May 2020

Multi-task Knowledge Distillation with Rhythm Features for Speaker Verification

Authors

Abstract

Keywords

Conference Proceedings