Potentials for ASR based on Multiple Acoustic Models and Model Selection using Standard Speech Features

Konferenz: Sprachkommunikation - Beiträge zur 10. ITG-Fachtagung
26.09.2012 - 28.09.2012 in Braunschweig, Deutschland

Tagungsband: Sprachkommunikation

Seiten: 4Sprache: EnglischTyp: PDF

Persönliche VDE-Mitglieder erhalten auf diesen Artikel 10% Rabatt

Autoren:
Winkler, Thomas (Fraunhofer IAIS, 53757 Sankt Augustin, Germany )
Stein, Daniel; Bardeli, Rolf; Schneider, Daniel; Köhler, Joachim (Fraunhofer IAIS, 53757 Sankt Augustin, Germany)

Inhalt:
Acoustic modelling is a key issue for successful automatic speech recognition (ASR). Common ASR systems are usually adapted to a certain use case by training robust acoustic models on speech data from the domain recorded in conditions typical for the use case. Varying conditions thus need either multi-conditional or multiple acoustic models. We present a multi-model approach coping with various acoustic conditions in this work. For each utterance the best matching set of acoustic models is selected based on acoustic information of the same acoustic features and acoustic models used for ASR. Our initial experiments show, that we achieve results comparable to a manual selection of the acoustic models but that we are still slightly outperformed by multiconditional models with a comparable number of mixtures. We further show, that an ideal selection would indeed improve the results compared to multi-conditional models.