Switching Linear Dynamic Models for Recognition of Emotionally Colored and Noisy Speech

Konferenz: Sprachkommunikation 2010 - 9. ITG-Fachtagung
06.10.2010 - 08.10.2010 in Bochum, Deutschland

Tagungsband: Sprachkommunikation 2010

Seiten: 4Sprache: EnglischTyp: PDF

Persönliche VDE-Mitglieder erhalten auf diesen Artikel 10% Rabatt

Autoren:
Wöllmer, Martin; Klebert, Nikolaj; Schuller, Björn (Technische Universität München, Institute for Human-Machine Communication, Theresienstr. 90, 80333 München, Germany)

Inhalt:
Model-based speech feature enhancement techniques were shown to be a promising approach towards increasing the robustness of automatic speech recognition in noisy environments. Strategies that model speech with a Switching Linear Dynamic Model (SLDM) have been successfully applied to noisy speech recognition tasks, since they overcome the limitations of GMM- or HMM-based approaches. However, SLDM-based feature enhancement has so far only been investigated for the recognition of isolated words or relatively friendly scenarios such as connected digit recognition under the presence of additive noise using whole word models (e. g. the AURORA task). In order to give an impression of the effectiveness of SLDM speech modeling for more challenging ASR applications, we evaluate SLDM feature enhancement for continuous recognition of spontaneous and emotionally colored speech in the noise. As backend we use tied-state triphone models trained and evaluated on the SAL Corpus. Applying SLDM-based feature enhancement, we achieve an average relative performance gain of almost 20% when considering diverse noise settings.