Translating images to words for recognizing objects in large image and video collections

Duygulu, Pinar; Bastan, Muhammet; Forsyth, David

doi:10.81043/aperta.41783

1 Ocak 2006 Konferans bildirisi Açık Erişim

Translating images to words for recognizing objects in large image and video collections

Duygulu, Pinar; Bastan, Muhammet; Forsyth, David

MARC21 XML

<?xml version='1.0' encoding='UTF-8'?>
<record xmlns="http://www.loc.gov/MARC21/slim">
  <leader>00000nam##2200000uu#4500</leader>
  <datafield tag="700" ind1=" " ind2=" ">
    <subfield code="a">Bastan, Muhammet</subfield>
    <subfield code="u">Bilkent Univ, Dept Comp Engn, Ankara, Turkey</subfield>
  </datafield>
  <datafield tag="700" ind1=" " ind2=" ">
    <subfield code="a">Forsyth, David</subfield>
    <subfield code="u">Univ Illinois, Urbana, IL USA</subfield>
  </datafield>
  <datafield tag="980" ind1=" " ind2=" ">
    <subfield code="a">user-tubitak-destekli-proje-yayinlari</subfield>
  </datafield>
  <datafield tag="540" ind1=" " ind2=" ">
    <subfield code="a">Creative Commons Attribution</subfield>
    <subfield code="u">http://www.opendefinition.org/licenses/cc-by</subfield>
  </datafield>
  <datafield tag="773" ind1=" " ind2=" ">
    <subfield code="i">isVersionOf</subfield>
    <subfield code="a">10.81043/aperta.41782</subfield>
    <subfield code="n">doi</subfield>
  </datafield>
  <datafield tag="024" ind1=" " ind2=" ">
    <subfield code="a">10.81043/aperta.41783</subfield>
    <subfield code="2">doi</subfield>
  </datafield>
  <datafield tag="245" ind1=" " ind2=" ">
    <subfield code="a">Translating images to words for recognizing objects in large image and video collections</subfield>
  </datafield>
  <datafield tag="100" ind1=" " ind2=" ">
    <subfield code="a">Duygulu, Pinar</subfield>
    <subfield code="u">Bilkent Univ, Dept Comp Engn, Ankara, Turkey</subfield>
  </datafield>
  <datafield tag="909" ind1="C" ind2="O">
    <subfield code="o">oai:zenodo.org:41783</subfield>
    <subfield code="p">user-tubitak-destekli-proje-yayinlari</subfield>
  </datafield>
  <datafield tag="650" ind1="1" ind2="7">
    <subfield code="2">opendefinition.org</subfield>
    <subfield code="a">cc-by</subfield>
  </datafield>
  <datafield tag="260" ind1=" " ind2=" ">
    <subfield code="c">2006-01-01</subfield>
  </datafield>
  <datafield tag="856" ind1="4" ind2=" ">
    <subfield code="u">https://aperta.ulakbim.gov.trrecord/41783/files/bib-a67dfe63-8be7-44a9-8ab7-b74fbcaa764a.txt</subfield>
    <subfield code="z">md5:3de6abcc79770f06fc760650526a48a6</subfield>
    <subfield code="s">176</subfield>
  </datafield>
  <datafield tag="542" ind1=" " ind2=" ">
    <subfield code="l">open</subfield>
  </datafield>
  <controlfield tag="005">20210315204626.0</controlfield>
  <controlfield tag="001">41783</controlfield>
  <datafield tag="980" ind1=" " ind2=" ">
    <subfield code="a">publication</subfield>
    <subfield code="b">conferencepaper</subfield>
  </datafield>
  <datafield tag="711" ind1=" " ind2=" ">
    <subfield code="a">TOWARD CATEGORY-LEVEL OBJECT RECOGNITION</subfield>
  </datafield>
  <datafield tag="520" ind1=" " ind2=" ">
    <subfield code="a">We present a new approach to the object recognition problem, motivated by the recent availability of large annotated image and video collections. This approach considers object recognition as the translation of visual elements to words, similar to the translation of text from one language to another. The visual elements represented in feature space are categorized into a finite set of blobs. The correspondences between the blobs and the words are learned, using a method adapted from Statistical Machine Translation. Once learned, these correspondences can be used to predict words corresponding to particular image regions (region naming), to predict words associated with the entire images (auto-annotation), or to associate the speech transcript text with the correct video frames (video alignment). We present our results on the Corel data set which consists of annotated images and on the TRECVID 2004 data set which consists of video frames associated with speech transcript text and manual annotations.</subfield>
  </datafield>
</record>

görüntülenme

indirilme

Daha fazla ayrıntı...

Görüntülenme	44
İndirme	7
Veri hacmi	1.2 kB
Tekil görüntülenme	42
Tekil indirme	7

Kayıt Bilgileri

Yayınlanma tarihi:: 01/01/2006
Konferans Bilgileri:: TOWARD CATEGORY-LEVEL OBJECT RECOGNITION
Lisans:: Creative Commons Attribution

Translating images to words for recognizing objects in large image and video collections

Translating images to words for recognizing objects in large image and video collections

MARC21 XML

Kayıt Bilgileri

Alıntı yap

Paylaş

Dışa aktar

TÜBİTAK ULAKBİM

İLETİŞİM