On Feature Importance and Interpretability of Speaker Representations

Conference: Speech Communication - 15th ITG Conference
09/20/2023 - 09/22/2023 at Aachen

doi:10.30420/456164037

Proceedings: ITG-Fb. 312: Speech Communication

Pages: 5Language: englishTyp: PDF

Authors:
Rautenberg, Frederik; Kuhlmann, Michael; Haeb-Umbach, Reinhold (Department of Communications Engineering, Paderborn University, Germany)
Wiechmann, Jana; Seebauer, Fritz; Wagner, Petra (Phonetics Work Group, Bielefeld University, Germany)

Abstract:
Unsupervised speech disentanglement aims at separating fast varying from slowly varying components of a speech signal. In this contribution, we take a closer look at the embedding vector representing the slowly varying signal components, commonly named the speaker embedding vector. We ask, which properties of a speaker’s voice are captured and investigate to which extent do individual embedding vector components sign responsible for them, using the concept of Shapley values. Our findings show that certain speaker-specific acoustic-phonetic properties can be fairly well predicted from the speaker embedding, while the investigated more abstract voice quality features cannot.