Konferans bildirisi Açık Erişim
Bilgin Taşdemir, Esma F.; Tandoğan, Zeynep; Akansu, S. Doğan; Kızılırmak, Fırat; Şen, Umut; Akca, Aysu; Kuru, Mehmet; Yanıkoğlu, Berrin
<?xml version='1.0' encoding='UTF-8'?>
<record xmlns="http://www.loc.gov/MARC21/slim">
<leader>00000nam##2200000uu#4500</leader>
<datafield tag="653" ind1=" " ind2=" ">
<subfield code="a">Ottoman Document Recognition</subfield>
</datafield>
<datafield tag="653" ind1=" " ind2=" ">
<subfield code="a">Deep Learning</subfield>
</datafield>
<datafield tag="653" ind1=" " ind2=" ">
<subfield code="a">Transcription</subfield>
</datafield>
<datafield tag="041" ind1=" " ind2=" ">
<subfield code="a">eng</subfield>
</datafield>
<datafield tag="909" ind1="C" ind2="O">
<subfield code="o">oai:aperta.ulakbim.gov.tr:274262</subfield>
</datafield>
<datafield tag="520" ind1=" " ind2=" ">
<subfield code="a"><p>With the accelerated pace of digitization, a vast collection of Ottoman documents has become accessible to researchers and the general public. However, most users interested in these documents are unable to read them, as the text is Turkish written in the Arabic-Persian script. Manual transcription of such a massive amount of documents is also beyond the capacity of human experts. With the advancements in deep learning, we have been able to provide a solution to the long-standing problem of automatic transcription of printed Ottoman documents. We evaluated three decoding strategies including Word Beam Search that allows to use a recognition lexicon and n-gram statistics during the decoding phase. Furthermore, the effect of lexicon size and coverage and language modelling via character or word n-grams are also evaluated. Using a general purpose large lexicon of the Ottoman era (260K words and 86% test coverage), the performance is measured as 6.59% character error rate and 28.46% word error rate on a test set of 6, 828 text lines.</p></subfield>
</datafield>
<datafield tag="980" ind1=" " ind2=" ">
<subfield code="a">publication</subfield>
<subfield code="b">conferencepaper</subfield>
</datafield>
<datafield tag="540" ind1=" " ind2=" ">
<subfield code="a">Creative Commons Attribution Share-Alike</subfield>
<subfield code="u">http://www.opendefinition.org/licenses/cc-by-sa</subfield>
</datafield>
<datafield tag="773" ind1=" " ind2=" ">
<subfield code="i">isVersionOf</subfield>
<subfield code="a">10.48623/aperta.274261</subfield>
<subfield code="n">doi</subfield>
</datafield>
<datafield tag="100" ind1=" " ind2=" ">
<subfield code="a">Bilgin Taşdemir, Esma F.</subfield>
<subfield code="u">Medeniyet Üniversitesi</subfield>
</datafield>
<datafield tag="856" ind1="4" ind2=" ">
<subfield code="z">md5:7b5baefc21ac43df114ee607aa2748b4</subfield>
<subfield code="s">643415</subfield>
<subfield code="u">https://aperta.ulakbim.gov.trrecord/274262/files/DAS-2024.pdf</subfield>
</datafield>
<controlfield tag="005">20250203152714.0</controlfield>
<datafield tag="260" ind1=" " ind2=" ">
<subfield code="c">2024-08-30</subfield>
</datafield>
<datafield tag="024" ind1=" " ind2=" ">
<subfield code="a">10.48623/aperta.274262</subfield>
<subfield code="2">doi</subfield>
</datafield>
<datafield tag="542" ind1=" " ind2=" ">
<subfield code="l">open</subfield>
</datafield>
<datafield tag="245" ind1=" " ind2=" ">
<subfield code="a">Automatic Transcription of Ottoman Documents Using Deep Learning</subfield>
</datafield>
<datafield tag="650" ind1="1" ind2="7">
<subfield code="a">cc-by</subfield>
<subfield code="2">opendefinition.org</subfield>
</datafield>
<datafield tag="700" ind1=" " ind2=" ">
<subfield code="a">Tandoğan, Zeynep</subfield>
</datafield>
<datafield tag="700" ind1=" " ind2=" ">
<subfield code="a">Akansu, S. Doğan</subfield>
</datafield>
<datafield tag="700" ind1=" " ind2=" ">
<subfield code="a">Kızılırmak, Fırat</subfield>
</datafield>
<datafield tag="700" ind1=" " ind2=" ">
<subfield code="a">Şen, Umut</subfield>
<subfield code="u">Sabancı Üniversitesi</subfield>
</datafield>
<datafield tag="700" ind1=" " ind2=" ">
<subfield code="a">Akca, Aysu</subfield>
<subfield code="u">Viyana Üniversitesi</subfield>
</datafield>
<datafield tag="700" ind1=" " ind2=" ">
<subfield code="a">Kuru, Mehmet</subfield>
<subfield code="u">Sabancı Üniversitesi</subfield>
</datafield>
<datafield tag="700" ind1=" " ind2=" ">
<subfield code="a">Yanıkoğlu, Berrin</subfield>
<subfield code="u">Sabancı Üniversitesi</subfield>
</datafield>
<controlfield tag="001">274262</controlfield>
</record>
| Tüm sürümler | Bu sürüm | |
|---|---|---|
| Görüntülenme | 259 | 259 |
| İndirme | 74 | 74 |
| Veri hacmi | 47.6 MB | 47.6 MB |
| Tekil görüntülenme | 182 | 182 |
| Tekil indirme | 65 | 65 |