A Feature-based Approach to Noise Robust Speech Detection

Conference: Sprachkommunikation - Beiträge zur 10. ITG-Fachtagung
09/26/2012 - 09/28/2012 at Braunschweig, Deutschland

Proceedings: Sprachkommunikation

Pages: 4Language: englishTyp: PDF

Personal VDE Members are entitled to a 10% discount on this title

Authors:
von Zeddelmann, Dirk (Fraunhofer FKIE, Communication Systems, Neuenahrer Str. 20, 53343 Wachtberg, Germany)

Abstract:
We propose a robust and easy to realize method for unsupervised Speech Detection (SD) in the context of audio monitoring applications. SD is posed as a binary classification task with the goal of localizing speech in an acoustic monitoring recording. In realistic monitoring settings speech is usually interfered by noisy masking components. The proposed method overcomes this problem to a certain extent by using a parametric mel frequency cepstral coefficients (MFCC) -like feature extraction process, explicitly guided by the human speech production and perceptual characteristics of the human ear. The resulting feature sequence is subsequently interpreted as a set of subband signals. Due to the speech-specific frequency adaptation in the feature extraction process, the energy content of the averaged subband signals shows an extensive emphasis of relevant speech components. An experimental performance evaluation on both synthetic and real data shows a significant improvement especially in bad SNR conditions as compared to short time energy-based methods for unsupervised Voice Activity Detection (VAD).