Exploring Visualization Techniques for Interpretable Learning in Speech Enhancement Deep Neural Networks

Conference: Speech Communication - 15th ITG Conference
09/20/2023 - 09/22/2023 at Aachen

doi:10.30420/456164043

Proceedings: ITG-Fb. 312: Speech Communication

Pages: 5Language: englishTyp: PDF

Authors:
Nustede, Eike J.; Anemueller, Joern (Carl von Ossietzky University Oldenburg, Computational Audition Group, Germany & Dept. med. Physics & Acoustics and Cluster of Excellence Hearing4all, Oldenburg, Germany)

Abstract:
Deep network interpretability research provides an avenue to gain insights into the underlying mechanisms of noise suppression and artefact removal in speech enhancement networks. Analyzing the networks’ internal representations and activation patterns allows identification of critical acoustic features, and provides another approach for model optimization. In this paper, we contribute to the visualization of speech processing networks by leveraging the U-Network’s hierarchical and convolutional processing scheme to derive feature maps of audio samples at different levels in the network. Further, we adapt the activation maximization method to speech enhancement networks by regularizing the maximization process. The presented feature maps and filter activations follow a clear encoding/decoding scheme, starting with a decomposition into distinct acoustic features in the encoder, while the decoder combines sparse features into an enhanced spectrogram.