Konferans bildirisi Açık Erişim
Karaoglan, Bahar; Kisla, Tarik; Metin, Senem Kumova
<?xml version='1.0' encoding='UTF-8'?> <record xmlns="http://www.loc.gov/MARC21/slim"> <leader>00000nam##2200000uu#4500</leader> <datafield tag="245" ind1=" " ind2=" "> <subfield code="a">Description of Turkish Paraphrase Corpus Structure and Generation Method</subfield> </datafield> <datafield tag="024" ind1=" " ind2=" "> <subfield code="a">10.1007/978-3-319-75477-2_13</subfield> <subfield code="2">doi</subfield> </datafield> <controlfield tag="001">33379</controlfield> <datafield tag="980" ind1=" " ind2=" "> <subfield code="a">user-tubitak-destekli-proje-yayinlari</subfield> </datafield> <datafield tag="520" ind1=" " ind2=" "> <subfield code="a">Because developing a corpus requires a long time and lots of human effort, it is desirable to make it as resourceful as possible: rich in coverage, flexible, multipurpose and expandable. Here we describe the steps we took in the development of Turkish paraphrase corpus, the factors we considered, problems we faced and how we dealt with them. Currently our corpus contains nearly 4000 sentences with the ratio of 60% paraphrase and 40% non-paraphrase sentence pairs. The sentence pairs are annotated at 5-scale: paraphrase, encapsulating, encapsulated, non-paraphrase and opposite. The corpus is formulated in a database structure integrated with Turkish dictionary. The sources we used till now are news texts from Bilcon 2005 corpus, a set of professionally translated sentence pairs from MSRP corpus, multiple Turkish translations from different languages that are involved in Tatoeba corpus and user generated paraphrases.</subfield> </datafield> <datafield tag="650" ind1="1" ind2="7"> <subfield code="2">opendefinition.org</subfield> <subfield code="a">cc-by</subfield> </datafield> <datafield tag="700" ind1=" " ind2=" "> <subfield code="u">Ege Univ, Izmir, Turkey</subfield> <subfield code="a">Kisla, Tarik</subfield> </datafield> <datafield tag="700" ind1=" " ind2=" "> <subfield code="u">Izmir Univ Econ, Izmir, Turkey</subfield> <subfield code="a">Metin, Senem Kumova</subfield> </datafield> <datafield tag="980" ind1=" " ind2=" "> <subfield code="b">conferencepaper</subfield> <subfield code="a">publication</subfield> </datafield> <datafield tag="542" ind1=" " ind2=" "> <subfield code="l">open</subfield> </datafield> <datafield tag="100" ind1=" " ind2=" "> <subfield code="u">Ege Univ, Izmir, Turkey</subfield> <subfield code="a">Karaoglan, Bahar</subfield> </datafield> <datafield tag="711" ind1=" " ind2=" "> <subfield code="a">COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING, (CICLING 2016), PT I</subfield> </datafield> <datafield tag="260" ind1=" " ind2=" "> <subfield code="c">2018-01-01</subfield> </datafield> <controlfield tag="005">20210315185456.0</controlfield> <datafield tag="909" ind1="C" ind2="O"> <subfield code="o">oai:zenodo.org:33379</subfield> <subfield code="p">user-tubitak-destekli-proje-yayinlari</subfield> </datafield> <datafield tag="856" ind1="4" ind2=" "> <subfield code="z">md5:a934d48a119348aa8ea487e17a3fee80</subfield> <subfield code="s">198</subfield> <subfield code="u">https://aperta.ulakbim.gov.trrecord/33379/files/bib-eb115c39-0c99-42e3-80e7-784a3a220e7f.txt</subfield> </datafield> <datafield tag="540" ind1=" " ind2=" "> <subfield code="u">http://www.opendefinition.org/licenses/cc-by</subfield> <subfield code="a">Creative Commons Attribution</subfield> </datafield> </record>
Görüntülenme | 48 |
İndirme | 11 |
Veri hacmi | 2.2 kB |
Tekil görüntülenme | 44 |
Tekil indirme | 11 |