Bilingual I-Vector Extractor for DNN Hybrid Acoustic Model Training in German Speech Recognition Systems

Conference: Speech Communication - 14th ITG Conference
09/29/2021 - 10/01/2021 at online

Proceedings: ITG-Fb. 298: Speech Communication

Pages: 5Language: englishTyp: PDF

Personal VDE Members are entitled to a 10% discount on this title

Authors:
Wang, Yao; Gref, Michael; Walter, Oliver; Schmidt, Christoph (Fraunhofer Institute for Intelligent Analysis and Information Systems (IAIS), Sankt Augustin, Germany)

Abstract:
In recent research, i-vectors have been shown to be significantly beneficial for speaker recognition and have been successfully applied in deep neural network (DNN) acoustic model (AM) training to improve the performance of automatic speech recognition (ASR). This paper describes our work in developing a bilingual i-vector extractor for training a German speech recognition system. A bilingual data set, which consisting of German and English speech data is used to train an i-vector extractor for a DNN hybrid acoustic model. The system is evaluated on different data sets. The results show that i-vector extractors trained with bilingual data can be used to improve the training of ASR models in the case of insufficient monolingual data. Additionally, using telephone speech as a case study, we show that i-vector extractor training with data from this domain leads to improvements in recognition.