A Maximum Entropy Information Bottleneck (MEIB) Regularization for Generative Speech Enhancement with HiFi-GAN
Conference: Speech Communication - 15th ITG Conference
09/20/2023 - 09/22/2023 at Aachen
doi:10.30420/456164055
Proceedings: ITG-Fb. 312: Speech Communication
Pages: 5Language: englishTyp: PDF
Authors:
Sach, Marvin; Pirklbauer, Jan; Fingscheidt, Tim (Institute for Communications Technology, TU Braunschweig, Germany)
Fluyt, Kristoff; Tirry, Wouter (Goodix Technology (Belgium) BV, Leuven, Belgium)
Abstract:
Generative approaches to speech enhancement using a vocoder to synthesize a clean speech estimate aim at solving the problem of residual noise occuring with typical maskbased spectral estimation approaches. The necessity to restrict the system’s knowledge to only clean speech and to prevent the possibility of noise reconstruction has recently motivated the introduction of a sparse autoencoder (AE) bottleneck using a pre-trained vector quantizer codebook. In our work, inspired from information bottleneck theory, we propose a maximum entropy information bottleneck (MEIB) regularization, which we derive for the deterministic AE with quantized bottleneck on time series. Furthermore, we introduce a feature-matching regularization encouraging noisy inputs and clean inputs to select the same vector quantizer symbols. The proposed methods significantly elevate our denoising performance by 0.23 PESQ points and 0.06 DNSMOS points.