Investigating disentanglement of speaker identity and characteristics through user experience
Konferenz: Speech Communication - 15th ITG Conference
20.09.2023-22.09.2023 in Aachen
doi:10.30420/456164046
Tagungsband: ITG-Fb. 312: Speech Communication
Seiten: 5Sprache: EnglischTyp: PDF
Autoren:
Rallabandi, Sai Sirisha (Quality and Usability Lab, Technische Universität Berlin, Germany)
Moeller, Sebastian (Quality and Usability Lab, Technische Universität Berlin, Germany & Speech and Language Technology, Deutsches Forschungszentrum für Künstliche Intelligenz (DFKI), Berlin, Germany)
Inhalt:
In this paper, we investigate the disentanglement of speakerspecific information in voice-converted female synthetic voices. We categorize this speaker-specific information into a) speaker identity and b) social speaker characteristics. The separability and inter-dependence of these two categories were investigated based on the user experience using five different evaluation methods namely, a) speech quality, b) intelligibility, c) semantic differential scaling test, d) speaker similarity test, and e) characteristic similarity test. The analysis of the subjective results shows that intelligibility significantly impacts the perceptions of other evaluation methods. We have also observed statistically significant differences between speaker similarity and characteristic similarity tests.