On Feature Importance and Interpretability of Speaker Representations

Conference: Speech Communication - 15th ITG Conference
09/20/2023 - 09/22/2023 at Aachen

doi:10.30420/456164037

Proceedings: ITG-Fb. 312: Speech Communication

Pages: 5Language: englishTyp: PDF

Authors:
Rautenberg, Frederik; Kuhlmann, Michael; Haeb-Umbach, Reinhold (Department of Communications Engineering, Paderborn University, Germany)
Wiechmann, Jana; Seebauer, Fritz; Wagner, Petra (Phonetics Work Group, Bielefeld University, Germany)

Abstract:
Unsupervised speech disentanglement aims at separating fast varying from slowly varying components of a speech signal. In this contribution, we take a closer look at the embedding vector representing the slowly varying signal components, commonly named the speaker embedding vector. We ask, which properties of a speaker’s voice are captured and investigate to which extent do individual embedding vector components sign responsible for them, using the concept of Shapley values. Our findings show that certain speaker-specific acoustic-phonetic properties can be fairly well predicted from the speaker embedding, while the investigated more abstract voice quality features cannot.

On Feature Importance and Interpretability of Speaker Representations

Individual Cookie Settings

Necessary Cookies

Optional Cookies