Published January 1, 2022 | Version v1
Conference paper Open

Auxiliary Classifier based Residual RNN for Image Captioning

  • 1. Izmir Katip Celebi Univ, Elect & Elect Engn Grad Program, Izmir, Turkey
  • 2. Izmir Katip Celebi Univ, Dept Comp Engn, Izmir, Turkey
  • 3. Univ Surrey, Ctr Vis Speech & Signal Proc CVSSP, Guildford, Surrey, England

Description

Image captioning aims to generate a description of visual contents with natural language automatically. This is useful in several potential applications, such as image understanding and virtual assistants. With recent advances in deep neural networks, natural and semantic text generation has been improved in image captioning. However, maintaining the gradient flow between neurons in consecutive layers becomes challenging as the network gets deeper. In this paper, we propose to integrate an auxiliary classifier in the residual recurrent neural network, which enables the gradient flow to reach the bottom layers for enhanced caption generation. Experiments on the MSCOCO and VizWiz datasets demonstrate the advantage of our proposed approach over the state-of-the-art approaches in several performance metrics.

Files

bib-8eb1a6c7-3400-43e7-b50b-7056411dc07c.txt

Files (174 Bytes)

Name Size Download all
md5:45385fdf6b3671c60cdd7bfac2222143
174 Bytes Preview Download