Description of Turkish Paraphrase Corpus Structure and Generation Method

Karaoglan, Bahar; Kisla, Tarik; Metin, Senem Kumova

doi:10.1007/978-3-319-75477-2_13

Published January 1, 2018 | Version v1

Conference paper Open

Description of Turkish Paraphrase Corpus Structure and Generation Method

1. Ege Univ, Izmir, Turkey
2. Izmir Univ Econ, Izmir, Turkey

Because developing a corpus requires a long time and lots of human effort, it is desirable to make it as resourceful as possible: rich in coverage, flexible, multipurpose and expandable. Here we describe the steps we took in the development of Turkish paraphrase corpus, the factors we considered, problems we faced and how we dealt with them. Currently our corpus contains nearly 4000 sentences with the ratio of 60% paraphrase and 40% non-paraphrase sentence pairs. The sentence pairs are annotated at 5-scale: paraphrase, encapsulating, encapsulated, non-paraphrase and opposite. The corpus is formulated in a database structure integrated with Turkish dictionary. The sources we used till now are news texts from Bilcon 2005 corpus, a set of professionally translated sentence pairs from MSRP corpus, multiple Turkish translations from different languages that are involved in Tatoeba corpus and user generated paraphrases.

Files

bib-eb115c39-0c99-42e3-80e7-784a3a220e7f.txt

Files (198 Bytes)

Name	Size	Download all
bib-eb115c39-0c99-42e3-80e7-784a3a220e7f.txt md5:a934d48a119348aa8ea487e17a3fee80	198 Bytes	Preview Download

	All versions	This version
Views	84	84
Downloads	23	23
Data volume	4.6 kB	4.6 kB

Description of Turkish Paraphrase Corpus Structure and Generation Method

Files

bib-eb115c39-0c99-42e3-80e7-784a3a220e7f.txt

Files (198 Bytes)

TÜBİTAK ULAKBİM

CONTACT

Description of Turkish Paraphrase Corpus Structure and Generation Method

Creators

Description

Files

bib-eb115c39-0c99-42e3-80e7-784a3a220e7f.txt

Files (198 Bytes)