Global vs. Local Federated Learning in Heterogeneous Acoustic Environments

Conference: Speech Communication - 15th ITG Conference
09/20/2023 - 09/22/2023 at Aachen

doi:10.30420/456164033

Proceedings: ITG-Fb. 312: Speech Communication

Pages: 5Language: englishTyp: PDF

Authors:
Glitza, Rene; Becker, Luca; Martin, Rainer (Institute of Communication Acoustics, Ruhr-Universität Bochum, Germany)

Abstract:
In complex acoustic environments, speech signals picked up by distributed microphones will depend on room acoustics and local ambient noise conditions. While privacy concerns would disallow sending raw audio to a centralized server, it is still of interest to adapt clients and possibly a centralized server to local conditions. In this paper, we analyze the benefits of (clustered) federated learning in a speech classification experiment where we adapt the classifier to local acoustic conditions. Specifically, we consider a multi-room apartment scenario with several acoustic sensors, target speakers, and ambient noise sources. A baseline gender recognition model trained on clean data is compared to models trained either with data from small sets of local (within room) clients or trained on data from clients of all rooms. The results of this case study show that averaging data from all clients of the simulated apartment increases the performance of the global model relative to the baseline, and that room-specific classification models improve classification performance on data in their respective local environment. Thus, it appears to be worthwhile and feasible to cluster devices in smart home environments and to adapt them to local acoustic conditions.