Speech-based Age and Gender Prediction with Transformers

Konferenz: Speech Communication - 15th ITG Conference
20.09.2023-22.09.2023 in Aachen

doi:10.30420/456164008

Tagungsband: ITG-Fb. 312: Speech Communication

Seiten: 5Sprache: EnglischTyp: PDF

Autoren:
Burkhardt, Felix; Wagner, Johannes; Wierstorf, Hagen; Eyben, Florian (audEERING GmbH, Germany)
Schuller, Bjoern (audEERING GmbH, Germany & Chair EIHW, University of Augsburg, Germany & GLAM, Imperial College London, UK)

Inhalt:
We report on the curation of several publicly available datasets for age and gender prediction. Furthermore, we present experiments to predict age and gender with models based on a pre-trained wav2vec 2.0. Depending on the dataset, we achieve an MAE between 7.1 years and 10.8 years for age, and at least 91.1%ACC for gender (female, male, child). Compared to a modelling approach built on handcrafted features, our proposed system shows an improvement of 9% UAR for age and 4% UAR for gender. To make our findings reproducible, we release the best performing model to the community as well as the sample lists of the data splits.