Integrate Spatial Information into Channel Attention via a Multi- Scale Convolutional Attention Module

Konferenz: CAIBDA 2022 - 2nd International Conference on Artificial Intelligence, Big Data and Algorithms
17.06.2022 - 19.06.2022 in Nanjing, China

Tagungsband: CAIBDA 2022

Seiten: 7Sprache: EnglischTyp: PDF

Autoren:
Xu, Chentianye (Department of Artificial Intelligence, College of Computer Science, Wu Yuzhang Honors College, Sichuan University, Chengdu, China)

Inhalt:
Since the introduction of attention mechanisms, convolutional networks began to have the human-like ability to focus on regions of interest in images, which has contributed to breakthroughs in many computer vision tasks. Among all the attention mechanisms, channel attention is a relatively mature branch. The basic idea is that each channel provides information about a specific feature, so by adaptively adjusting the weight of each channel, the network can decide which features matter. However, almost all the works about channel attention fail to integrate spatial information to guide feature selections. In this paper, the researcher proposes a Multi-Scale Convolutional Attention Module (MSCAM), which integrates spatial information into channel attention. By performing several single-kernel multi-scale convolutions over input feature maps, the module outputs ‘weight feature maps’ describing the importance of each pixel, and by replacing global average pooling (GAP) with vector multiplication between the ‘weight feature maps’ and each input feature map channel, it can generate attention vectors containing richer spatial information. In this way, spatial information can effectively contribute to the adaptive feature selection process. The proposed module’s performance, proved by experiments on several datasets, exceeds its counterparts without spatial information.