Multi-level Distance Metric Learning for Few-shot Spoken term classification

Conference: ICETIS 2022 - 7th International Conference on Electronic Technology and Information Science
01/21/2022 - 01/23/2022 at Harbin, China

Proceedings: ICETIS 2022

Pages: 6Language: englishTyp: PDF

Authors:
Wu, Yiqi; Ma, Zhongchen (School of Computer Science and Communication Engineering, Jiangsu University, Zhenjiang, Jiangsu, China)
Mao, Qirong (School of Computer Science and Communication Engineering, Jiangsu University, Zhenjiang, Jiangsu, China & Jiangsu Engineering Research Center of Big Data Ubiquitous Perception and Intelligent Agriculture Applications, Zhenjiang, Jiangsu, China)

Abstract:
Spoken term classification has attracted great attention of researchers because of its wide application in real life. Traditional methods use a deep neural network trained with tremendous samples to classify a particular keyword which may not be applicable in case of insufficient samples and restrict the system from recognizing new or user-defined keywords. Few-shot learning algorithm is then applied to this speech task since it can use only a few examples to adapt to classifying new classes. Metric learning for few-shot learning aims to learn a metric in episode learning such that the distance between the positive pair of each episode is closed and the distance between the negative pair of each episode is far away. However, due to various pronunciation, speed or intonation, some spoken samples may be extremely dissimilar to its positive sample, but very similar to its negative sample, leading these samples to be ignored or overly influenced in metric learning. Therefore, to cope with this problem we propose a multi-level distance regularized metric learning method for few-shot spoken term classification. It explicitly disturbs a learning procedure by regularizing pairwise distances between embedding vectors into multiple levels. In this way, the degree of similarity between a pair can be fully utilized and the importance of each sample in metric learning is considered in a balanced manner. Multi-level Distance Regularization with simple Prototypical Network achieves the state-of-the-art performance in benchmark dataset: Google Speech Commands dataset.