Veri seti Açık Erişim
Gebeşçe, Ali;
Gül Şahin, Gözde;
Amasya, Ege Uğur
<?xml version='1.0' encoding='UTF-8'?>
<record xmlns="http://www.loc.gov/MARC21/slim">
<leader>00000nmm##2200000uu#4500</leader>
<datafield tag="041" ind1=" " ind2=" ">
<subfield code="a">eng</subfield>
</datafield>
<datafield tag="909" ind1="C" ind2="O">
<subfield code="o">oai:aperta.ulakbim.gov.tr:286016</subfield>
</datafield>
<datafield tag="520" ind1=" " ind2=" ">
<subfield code="a"><p>We introduce <em><strong>TWiST</strong></em>&mdash;the Turkish-English Wikipedia &amp; Thesis STEM Terminology Dataset&mdash;an expertly curated, sentence-aligned bilingual resource that addresses a key gap in Turkish computational linguistics. <em>TWiST</em> is a 3,300-sentence Turkish&ndash;English parallel corpus of STEM terminology drawn from two sources: 1,185 sentences from Wikimedia Content Translation dump and 2,115 sentences from 287 graduate-thesis abstracts at Turkey&rsquo;s top six universities. Focused on Mathematics, Physics, and Computer Science, every sentence pair was triple-annotated by 43 trained bilingual annotators following a 30-page guideline, achieving substantial agreement (Fleiss &kappa; &asymp; 0.7). <em>TWiST</em> ultimately captures 10,157 annotated term instances covering 1,223 distinct English technical terms, offering a high-quality benchmark for bilingual terminology extraction, translation consistency, and terminology-aware NLP.</p></subfield>
</datafield>
<datafield tag="980" ind1=" " ind2=" ">
<subfield code="a">dataset</subfield>
</datafield>
<datafield tag="540" ind1=" " ind2=" ">
<subfield code="a">Creative Commons Attribution-NonCommercial</subfield>
<subfield code="u">https://creativecommons.org/licenses/by-nc/4.0/</subfield>
</datafield>
<datafield tag="773" ind1=" " ind2=" ">
<subfield code="i">isVersionOf</subfield>
<subfield code="a">10.48623/aperta.286015</subfield>
<subfield code="n">doi</subfield>
</datafield>
<datafield tag="100" ind1=" " ind2=" ">
<subfield code="0">(orcid)0000-0002-7997-0557</subfield>
<subfield code="a">Gebeşçe, Ali</subfield>
<subfield code="u">Koç Üniversitesi</subfield>
</datafield>
<datafield tag="856" ind1="4" ind2=" ">
<subfield code="z">md5:65c835744f5cccc043b06e7eb713f522</subfield>
<subfield code="s">13867478</subfield>
<subfield code="u">https://aperta.ulakbim.gov.trrecord/286016/files/dataset.json</subfield>
</datafield>
<controlfield tag="005">20250625101411.0</controlfield>
<datafield tag="260" ind1=" " ind2=" ">
<subfield code="c">2025-06-25</subfield>
</datafield>
<datafield tag="024" ind1=" " ind2=" ">
<subfield code="a">10.48623/aperta.286016</subfield>
<subfield code="2">doi</subfield>
</datafield>
<datafield tag="542" ind1=" " ind2=" ">
<subfield code="l">open</subfield>
</datafield>
<datafield tag="245" ind1=" " ind2=" ">
<subfield code="a">TWiST: Turkish-English Wikipedia & Thesis STEM Terminology Dataset</subfield>
</datafield>
<datafield tag="650" ind1="1" ind2="7">
<subfield code="a">cc-by</subfield>
<subfield code="2">opendefinition.org</subfield>
</datafield>
<datafield tag="700" ind1=" " ind2=" ">
<subfield code="0">(orcid)0000-0002-0332-1657</subfield>
<subfield code="a">Gül Şahin, Gözde</subfield>
<subfield code="u">Koç Üniversitesi</subfield>
</datafield>
<datafield tag="700" ind1=" " ind2=" ">
<subfield code="a">Amasya, Ege Uğur</subfield>
<subfield code="u">Koç Üniversitesi</subfield>
</datafield>
<controlfield tag="001">286016</controlfield>
</record>
| Tüm sürümler | Bu sürüm | |
|---|---|---|
| Görüntülenme | 0 | 0 |
| İndirme | 0 | 0 |
| Veri hacmi | 0 Bytes | 0 Bytes |
| Tekil görüntülenme | 0 | 0 |
| Tekil indirme | 0 | 0 |