Distribution Mismatch Correction for Acoustic Scene Classification

Konferenz: Speech Communication - 15th ITG Conference
20.09.2023-22.09.2023 in Aachen

doi:10.30420/456164041

Tagungsband: ITG-Fb. 312: Speech Communication

Seiten: 5Sprache: EnglischTyp: PDF

Autoren:
Maier, Lukas; Fuchs, Alexander; Pernkopf, Franz (Christian Doppler Laboratory for Dependable Intelligent Systems in Harsh Environments, Graz University of Technology, Austria)

Inhalt:
While deep learning methods have shown immense benefits for Acoustic Scene Classification (ASC) tasks in terms of performance, they also introduce new challenges as these methods are prone to suffer from large performance degradation for out of distribution data. To build robust ASC models that can achieve reliable performance across multiple recording devices, the architecture has to be able to quickly adapt to changing input and activation distributions. We present ASCMobConvNet, a CNN architecture based on Mobile Inverted Bottleneck Convolutions. In order to better adapt to domain shifts and the resulting change in activation distributions, it uses sub-spectral normalization layers in combination with residual normalization instead of batch normalization layers. Furthermore, the model corrects non-parametric mismatches in the activation distributions through the integration of Wasserstein distribution correction layers. Using our proposed architecture we are able to achieve an test accuracy of 68:10% on the TAU Urban Acoustic Scenes 2020 Mobile development dataset. Using Wasserstein distribution correction layers we can further improve the accuracy by 0:68 %.