CRNN-based Multi-DOA Estimator: Comparing Classification and Regression

Conference: Speech Communication - 15th ITG Conference
09/20/2023 - 09/22/2023 at Aachen

doi:10.30420/456164030

Proceedings: ITG-Fb. 312: Speech Communication

Pages: 5Language: englishTyp: PDF

Authors:
Cooreman, Pieter; Bohlender, Alexander; Madhu, Nilesh (IDLab, Department of Electronics and Information Systems, Ghent University - imec, Belgium)

Abstract:
Deep learning methods have greatly improved the localization of sound sources in adverse conditions. An important consideration in this case is the output representation. Direction of arrival (DOA) estimation can be interpreted as a classification problem, but performing a regression to continuously estimate the DOAs is also possible. Whereas classification and regression were previously compared for particular cases, such as frame-wise DOA estimation and single source conditions, in this paper we study the more general localization of one or two concurrent sources with a convolutional recurrent neural network. Our experiments show that the two approaches perform comparably in single source scenarios. To address the ambiguity in the source-to-output assignment when multiple DOAs are estimated using regression, we consider permutation invariant training and angular sorting of the desired outputs. However, we find that classification is then generally preferred, especially for closely spaced sources.