Rank based Decoding for Improved DNN/HMM Hybrid Acoustic Models in the EML Transcription Platform
Conference: Speech Communication - 12. ITG-Fachtagung Sprachkommunikation
10/05/2016 - 10/07/2016 at Paderborn, Deutschland
Proceedings: Speech Communication
Pages: 5Language: englishTyp: PDF
Personal VDE Members are entitled to a 10% discount on this title
Fischer, Volker; Kunzmann, Siegfried (EML European Media Laboratory GmbH, Schloss-Wolfsbrunnenweg 35, 69118 Heidelberg, Germany)
In this paper we review the use of neural network based acoustic models in the EML Transcription Platform and describe some recent improvements to DNN/HMM hybrid acoustic modeling, including vocal tract length perturbation (VTLP) and the use of a DNN state prior for decoding. We investigate network adaptation techniques to overcome a noise level mismatch between training and test data and propose the use of a robust, rank based decoding method as an alternative to the standard softmax output layer. We examine its impact on word error rate for both original and adapted DNN/HMM models for several noise conditions. Initial results obtained on a publicly available data set suggest that a rank based output layer can outperform the conventional softmax layer and is a powerful method if no adaptation data is available.