Utilizing Slow Feature Analysis for Lipreading
Konferenz: Speech Communication - 13. ITG-Fachtagung Sprachkommunikation
10.10.2018 - 12.10.2018 in Oldenburg, Deutschland
Tagungsband: Speech Communication
Seiten: 5Sprache: EnglischTyp: PDF
Persönliche VDE-Mitglieder erhalten auf diesen Artikel 10% Rabatt
Autoren:
Freiwald, Jan; Karbasi, Mahdie; Zeiler, Steffen; Melchior, Jan; Kompella, Varun; Wiskott, Laurenz; Kolossa, Dorothea (Institute of Communication Acousticsa and Institute for Neural Computationb, Ruhr University Bochum, Bochum, Germany)
Inhalt:
While speech recognition has become highly robust in the recent past, it is still a challenging task under very noisy or reverberant conditions. Augmenting speech recognition by lipreading from video input is hence a promising approach to make speech recognition more reliable. For this purpose, we consider slow feature analysis (SFA), an unsupervised machine learning method that finds temporally slowest varying features in sequential input data. It can automatically extract temporally slow features within a video sequence, such as lip movements, while at the same time removing quickly changing components such as noise. In this work, we apply SFA as an initial feature extraction step to the task of automatic lipreading. The performance is evaluated on small-vocabulary lipreading, both in the speaker-dependent and speaker-independent case, showing that the features are competitive to the often highly successful combination of a discrete cosine transform and a linear discriminant analysis, while also offering good interpretability.