Speaker Identification Via the Relation Network: a Meta-Learning Method

Konferenz: ICETIS 2022 - 7th International Conference on Electronic Technology and Information Science
21.01.2022 - 23.01.2022 in Harbin, China

Tagungsband: ICETIS 2022

Seiten: 7Sprache: EnglischTyp: PDF

Autoren:
Li, Yarong; Ke, Xianxin; Ping, Hu (School of Mechanical and Electrical Engineering and Automation, Shanghai University, Shanghai, China)

Inhalt:
Speaker identification(SI) can be categorized as closed-set and open-set according to the characteristics of data set. For the latter, the label space of the test phase and the training phase is disjoint. Essentially, this is a few-shot learning problem. In this study, spectrogram features are obtained from raw audio data, and a convolution-based speaker identification model is trained end-to-end by using spectrogram features within the framework of the relation network (RN), a metalearning based on the deep metric learning method. Our approach is validated on the VoxCeleb2 dataset, and compared with the model with prototypical network loss (PNL) which also leverages meta-learning for speaker identification, RN outperforms PNL and needs less data. In the absence of novel categories, the identification accuracy of the best-trained model can reach 85.44% when there are only 3 support samples in each category and 71.98% when there are only 1 support sample. In addition to demonstrating the effectiveness of this method, we also suggest possible directions for improvement.