A Feature-based Approach to Noise Robust Speech Detection

Konferenz: Sprachkommunikation - Beiträge zur 10. ITG-Fachtagung
26.09.2012 - 28.09.2012 in Braunschweig, Deutschland

Tagungsband: Sprachkommunikation

Seiten: 4Sprache: EnglischTyp: PDF

Persönliche VDE-Mitglieder erhalten auf diesen Artikel 10% Rabatt

Autoren:
von Zeddelmann, Dirk (Fraunhofer FKIE, Communication Systems, Neuenahrer Str. 20, 53343 Wachtberg, Germany)

Inhalt:
We propose a robust and easy to realize method for unsupervised Speech Detection (SD) in the context of audio monitoring applications. SD is posed as a binary classification task with the goal of localizing speech in an acoustic monitoring recording. In realistic monitoring settings speech is usually interfered by noisy masking components. The proposed method overcomes this problem to a certain extent by using a parametric mel frequency cepstral coefficients (MFCC) -like feature extraction process, explicitly guided by the human speech production and perceptual characteristics of the human ear. The resulting feature sequence is subsequently interpreted as a set of subband signals. Due to the speech-specific frequency adaptation in the feature extraction process, the energy content of the averaged subband signals shows an extensive emphasis of relevant speech components. An experimental performance evaluation on both synthetic and real data shows a significant improvement especially in bad SNR conditions as compared to short time energy-based methods for unsupervised Voice Activity Detection (VAD).