Knowledge distillation based on channel attention

Conference: CIBDA 2022 - 3rd International Conference on Computer Information and Big Data Applications
03/25/2022 - 03/27/2022 at Wuhan, China

Proceedings: CIBDA 2022

Pages: 7Language: englishTyp: PDF

Authors:
Meng, Xianfa; Liu, Fang (National Key Laboratory of Science and Technology on Automatic Target Recognition, National Defense University of Science and Technology, Changsha, Hunan, China)

Abstract:
With the development of convolutional neural network (CNN), the depth and width of CNN are increasing, which raises the demand for computing resources and storage space more and more. Knowledge distillation aims at transferring knowledge extracted from a teacher network as an additional label to guide the training of a lightweight student network. As an effective method of network compression, it has made a lot of research progress in multiple types of tasks. Aiming at the information redundancy problem of feature maps, we introduce a novel approach, dubbed Channel Attention Knowledge Distillation (CAKD). By extracting the channel weight knowledge, the student can learn the channel assigning ratio of the teacher network and then correct its channel information of feature maps. Extensive experiments on multiple datasets show that the proposed method can significantly improve the performance of student networks compared to the other knowledge distillation methods of the same type.