Visual question answering model based on graph neural network

Conference: ICETIS 2022 - 7th International Conference on Electronic Technology and Information Science
01/21/2022 - 01/23/2022 at Harbin, China

Proceedings: ICETIS 2022

Pages: 5Language: englishTyp: PDF

Authors:
Wu, Xiru; Chen, Nancong (College of Electronic Engineering and Automation, Gullin University of Electronic Technology, Guilin, Guangxi, China)

Abstract:
In recent years, the multi-modal visual question answering (VQA) technology based on the fusion of image visual features and question text features has attracted wide attention from researchers. However, the traditional VQA model ignores the dynamic relationship of semantic information in the bimodality and the rich spatial structure between different regions. For this reason, a multi-module visual question answering model based on graph neural network is proposed, so that the model can fully understand the dynamic interaction between objects in the visual scene and the text context representation. By using the graph neural network, the feature information and structure information of the graph can be learned automatically at the same time, and the text information can be fully corresponded to the image information, so as to solve the semantic gap of different models. The experimental results on the VQA2.0 data set show that the proposed VQA model is significantly better than the comparison method in evaluation indicators, and can effectively improve the accuracy of visual question and answer.