A Multi-Stage, Multi-Channel Processing System for Overlapping Speech Separation in a Real Scenario
Conference: Speech Communication - 11. ITG-Fachtagung Sprachkommunikation
09/24/2014 - 09/26/2014 at Erlangen, Deutschland
Proceedings: Speech Communication
Pages: 4Language: englishTyp: PDF
Personal VDE Members are entitled to a 10% discount on this title
Toroghi, Rahil Mahdian; Oualil, Youssef; Klakow, Dietrich (Spoken Language Systems, Saarland University, Saarbruecken, Germany)
This paper addresses the problem of overlapping speech separation in a noisy room using a microphone array. The presented approach proposes a multistage processing framework to separate the desired sources and reduce the corruptive effects of noise, reverberation and interference. More specifically, 1) a beamformer separates the sources based on their location diversities, 2) a postfilter maximizes the output SNRs, and 3) a novel filter is derived to suppress the coherent terms at each output with respect to its contrasting one. Finally, 4) the clean signal is estimated using a modified masking filter. Exploiting the fact that a desired signal remains coherent within time frames, the mask is smoothed between frames to preserve this coherency and reduce the musical noise. Experiments on AMI-Wall Street Journal corpus show a significant improvement in speech quality, SNR, Source to Reverberation Ratio, and naturalness of the proposed method, compared to some methods in Blind Source Separation.