Yayınlanmış 1 Ocak 2022
| Sürüm v1
Dergi makalesi
Açık
Caption generation on scenes with seen and unseen object categories
Oluşturanlar
- 1. Middle East Tech Univ, Dept Comp Engn, TR-06800 Ankara, Turkey
Açıklama
Image caption generation is one of the most challenging problems at the intersection of vision and language do-mains. In this work, we propose a realistic captioning task where the input scenes may incorporate visual objects with no corresponding visual or textual training examples. For this problem, we propose a detection-driven ap-proach that consists of a single-stage generalized zero-shot detection model to recognize and localize instances of both seen and unseen classes, and a template-based captioning model that transforms detections into sentences. To improve the generalized zero-shot detection model, which provides essential information for captioning, we define effective class representations in terms of class-to-class semantic similarities, and leverage their special structure to construct an effective unseen/seen class confidence score calibration mechanism. We also propose a novel evaluation metric that provides additional insights for the captioning outputs by separately measuring the visual and non-visual contents of generated sentences. Our experiments highlight the importance of studying captioning in the proposed zero-shot setting, and verify the effectiveness of the proposed detection-driven zero -shot captioning approach.(c) 2022 Elsevier B.V. All rights reserved.
Dosyalar
bib-6005f1aa-85b5-4d44-9b5f-e22c7a29b6cc.txt
Dosyalar
(140 Bytes)
| Ad | Boyut | Hepisini indir |
|---|---|---|
|
md5:382381950d945c37e0b718dc0adf66f6
|
140 Bytes | Ön İzleme İndir |