Exploring Visualization Techniques for Interpretable Learning in Speech Enhancement Deep Neural Networks
Conference: Speech Communication - 15th ITG Conference
09/20/2023 - 09/22/2023 at Aachen
doi:10.30420/456164043
Proceedings: ITG-Fb. 312: Speech Communication
Pages: 5Language: englishTyp: PDF
Authors:
Nustede, Eike J.; Anemueller, Joern (Carl von Ossietzky University Oldenburg, Computational Audition Group, Germany & Dept. med. Physics & Acoustics and Cluster of Excellence Hearing4all, Oldenburg, Germany)
Abstract:
Deep network interpretability research provides an avenue to gain insights into the underlying mechanisms of noise suppression and artefact removal in speech enhancement networks. Analyzing the networks’ internal representations and activation patterns allows identification of critical acoustic features, and provides another approach for model optimization. In this paper, we contribute to the visualization of speech processing networks by leveraging the U-Network’s hierarchical and convolutional processing scheme to derive feature maps of audio samples at different levels in the network. Further, we adapt the activation maximization method to speech enhancement networks by regularizing the maximization process. The presented feature maps and filter activations follow a clear encoding/decoding scheme, starting with a decomposition into distinct acoustic features in the encoder, while the decoder combines sparse features into an enhanced spectrogram.