Konferans bildirisi Açık Erişim

Turkish Treebanking: Unifying and Constructing Efforts

   Turk, Utku; Atmaca, Furkan; Ozates, Saziye Betul; Koksal, Abdullatif; Ozturk, Balkiz; Gungor, Tunga; Ozgur, Arzucan

In this paper, we present the re-annotation of the Turkish PUD Treebank and the first annotation of the Turkish National Corpus Universal Dependency (henceforth TNC-UD) Treebank as part of our efforts for unifying and extending the Turkish universal dependency treebanks. In accordance with the Universal Dependencies' guidelines and the necessities of Turkish grammar, both treebanks, the Turkish PUD Treebank and TNC-UD, were revised with regards to their syntactic relations. The TNC-UD is planned to have 10,000 sentences. In this paper, we present the first 500 sentences along with the re-annotation of the PUD Treebank. Moreover, this paper also offers the parsing results of a graph-based neural parser on the previous and re-annotated PUD, as well as the TNC-UD. In light of the comparisons, even though we observe a slight decrease in the attachment scores of the Turkish PUD treebank, we demonstrate that the annotation of the TNC-UD improves the parsing accuracy of Turkish. In addition to the treebanks, we have also constructed a custom annotation software with advanced filtering and morphological editing options. Both of the treebanks, including a full edit-history and the annotation guidelines, as well as the custom software are publicly available online under an open license.

Dosyalar (191 Bytes)
Dosya adı Boyutu
bib-f7f27652-48c8-4814-bee2-d43cf2d7b74e.txt
md5:cdbe717ff5622c42e5509246bb966d16
191 Bytes İndir
30
4
görüntülenme
indirilme
Görüntülenme 30
İndirme 4
Veri hacmi 764 Bytes
Tekil görüntülenme 29
Tekil indirme 4

Alıntı yap