The face encoder of speech portrait based on the ResNet

Konferenz: ISCTT 2022 - 7th International Conference on Information Science, Computer Technology and Transportation
27.05.2022 - 29.05.2022 in Xishuangbanna, China

Tagungsband: ISCTT 2022

Seiten: 7Sprache: EnglischTyp: PDF

Autoren:
Wang, Yuanyuan; Bu, Fanliang; Zhang, Tengfei; Hu, Zhexin; Yao, Yutong (People's Public Security University of China, Beijing, China)

Inhalt:
Speech portrait is the research focus in cross-modal recognition. In order to improve the accuracy and specificity of speech portrait, an improved speech-face portrait algorithm is proposed, which includes speech feature encoder and face feature encoder. In the face encoder part, the facial feature extraction network is established based on ResNet-50, and the model parameters are optimized by training on JAFFE and AVSpeech datasets. The visualization of the experimental results shows that the improved model has an ideal effect of facial feature extraction, can encode the main features, and can effectively improve the performance of facial feature extraction.