Sign Language Recognition Based on Lightweight 3D MobileNet-v2 and Knowledge Distillation

Konferenz: ICETIS 2022 - 7th International Conference on Electronic Technology and Information Science
21.01.2022 - 23.01.2022 in Harbin, China

Tagungsband: ICETIS 2022

Seiten: 6Sprache: EnglischTyp: PDF

Autoren:
Han, Xiangzu; Lu, Fei; Tian, Guohui (School of Control Science and Engineering, Shandong University, Jinan, Shandong, China)

Inhalt:
Sign language is the primary communication medium for deaf-mute people, including gestures, facial expressions, and body postures. Sign language recognition (SLR) aims to recognize the sign video into the word or sentence and promote communication between ordinary people and deaf people. Recently, due to the development of deep learning, deep neural networks, especially 3D convolutional neural networks (CNNs), have been widely used in SLR. In this paper, we seek efficient spatiotemporal modeling for SLR. Specifically, we first build the efficient 3D CNNs, i.e., 3D MobileNet-v2 for isolated SLR, and further enhance the performance by designing a random knowledge distillation strategy (RKD) to transfer the knowledge from multiple teacher models including R3D, R(2+1)D, and SlowFast networks. We also apply these lightweight models as spatiotemporal feature extractors in the Transformer framework for the more challenging continuous SLR. In the experiments, the distilled models show high efficiency and strong performance on the SLR-500 and CSL benchmarks. We conclude that the lightweight 3D MobileNet-v2 with the proposed RKD can achieve a balance between accuracy and efficiency and is very suitable for SLR.