Sign Language Recognition Based on Lightweight 3D MobileNet-v2 and Knowledge Distillation
Conference: ICETIS 2022 - 7th International Conference on Electronic Technology and Information Science
01/21/2022 - 01/23/2022 at Harbin, China
Proceedings: ICETIS 2022
Pages: 6Language: englishTyp: PDF
Authors:
Han, Xiangzu; Lu, Fei; Tian, Guohui (School of Control Science and Engineering, Shandong University, Jinan, Shandong, China)
Abstract:
Sign language is the primary communication medium for deaf-mute people, including gestures, facial expressions, and body postures. Sign language recognition (SLR) aims to recognize the sign video into the word or sentence and promote communication between ordinary people and deaf people. Recently, due to the development of deep learning, deep neural networks, especially 3D convolutional neural networks (CNNs), have been widely used in SLR. In this paper, we seek efficient spatiotemporal modeling for SLR. Specifically, we first build the efficient 3D CNNs, i.e., 3D MobileNet-v2 for isolated SLR, and further enhance the performance by designing a random knowledge distillation strategy (RKD) to transfer the knowledge from multiple teacher models including R3D, R(2+1)D, and SlowFast networks. We also apply these lightweight models as spatiotemporal feature extractors in the Transformer framework for the more challenging continuous SLR. In the experiments, the distilled models show high efficiency and strong performance on the SLR-500 and CSL benchmarks. We conclude that the lightweight 3D MobileNet-v2 with the proposed RKD can achieve a balance between accuracy and efficiency and is very suitable for SLR.