Veri seti Açık Erişim

TWiST: Turkish-English Wikipedia & Thesis STEM Terminology Dataset

Gebeşçe, Ali; Gül Şahin, Gözde; Amasya, Ege Uğur


MARC21 XML

<?xml version='1.0' encoding='UTF-8'?>
<record xmlns="http://www.loc.gov/MARC21/slim">
  <leader>00000nmm##2200000uu#4500</leader>
  <datafield tag="041" ind1=" " ind2=" ">
    <subfield code="a">eng</subfield>
  </datafield>
  <datafield tag="909" ind1="C" ind2="O">
    <subfield code="o">oai:aperta.ulakbim.gov.tr:286016</subfield>
  </datafield>
  <datafield tag="520" ind1=" " ind2=" ">
    <subfield code="a">&lt;p&gt;We introduce &lt;em&gt;&lt;strong&gt;TWiST&lt;/strong&gt;&lt;/em&gt;&amp;mdash;the Turkish-English Wikipedia &amp;amp; Thesis STEM Terminology Dataset&amp;mdash;an expertly curated, sentence-aligned bilingual resource that addresses a key gap in Turkish computational linguistics. &lt;em&gt;TWiST&lt;/em&gt; is a 3,300-sentence Turkish&amp;ndash;English parallel corpus of STEM terminology drawn from two sources: 1,185 sentences from Wikimedia Content Translation dump and 2,115 sentences from 287 graduate-thesis abstracts at Turkey&amp;rsquo;s top six universities. Focused on Mathematics, Physics, and Computer Science, every sentence pair was triple-annotated by 43 trained bilingual annotators following a 30-page guideline, achieving substantial agreement (Fleiss &amp;kappa; &amp;asymp; 0.7). &lt;em&gt;TWiST&lt;/em&gt; ultimately captures 10,157 annotated term instances covering 1,223 distinct English technical terms, offering a high-quality benchmark for bilingual terminology extraction, translation consistency, and terminology-aware NLP.&lt;/p&gt;</subfield>
  </datafield>
  <datafield tag="980" ind1=" " ind2=" ">
    <subfield code="a">dataset</subfield>
  </datafield>
  <datafield tag="540" ind1=" " ind2=" ">
    <subfield code="a">Creative Commons Attribution-NonCommercial</subfield>
    <subfield code="u">https://creativecommons.org/licenses/by-nc/4.0/</subfield>
  </datafield>
  <datafield tag="773" ind1=" " ind2=" ">
    <subfield code="i">isVersionOf</subfield>
    <subfield code="a">10.48623/aperta.286015</subfield>
    <subfield code="n">doi</subfield>
  </datafield>
  <datafield tag="100" ind1=" " ind2=" ">
    <subfield code="0">(orcid)0000-0002-7997-0557</subfield>
    <subfield code="a">Gebeşçe, Ali</subfield>
    <subfield code="u">Koç Üniversitesi</subfield>
  </datafield>
  <datafield tag="856" ind1="4" ind2=" ">
    <subfield code="z">md5:65c835744f5cccc043b06e7eb713f522</subfield>
    <subfield code="s">13867478</subfield>
    <subfield code="u">https://aperta.ulakbim.gov.trrecord/286016/files/dataset.json</subfield>
  </datafield>
  <controlfield tag="005">20250625101411.0</controlfield>
  <datafield tag="260" ind1=" " ind2=" ">
    <subfield code="c">2025-06-25</subfield>
  </datafield>
  <datafield tag="024" ind1=" " ind2=" ">
    <subfield code="a">10.48623/aperta.286016</subfield>
    <subfield code="2">doi</subfield>
  </datafield>
  <datafield tag="542" ind1=" " ind2=" ">
    <subfield code="l">open</subfield>
  </datafield>
  <datafield tag="245" ind1=" " ind2=" ">
    <subfield code="a">TWiST: Turkish-English Wikipedia &amp; Thesis STEM Terminology Dataset</subfield>
  </datafield>
  <datafield tag="650" ind1="1" ind2="7">
    <subfield code="a">cc-by</subfield>
    <subfield code="2">opendefinition.org</subfield>
  </datafield>
  <datafield tag="700" ind1=" " ind2=" ">
    <subfield code="0">(orcid)0000-0002-0332-1657</subfield>
    <subfield code="a">Gül Şahin, Gözde</subfield>
    <subfield code="u">Koç Üniversitesi</subfield>
  </datafield>
  <datafield tag="700" ind1=" " ind2=" ">
    <subfield code="a">Amasya, Ege Uğur</subfield>
    <subfield code="u">Koç Üniversitesi</subfield>
  </datafield>
  <controlfield tag="001">286016</controlfield>
</record>
0
0
görüntülenme
indirilme
Tüm sürümler Bu sürüm
Görüntülenme 00
İndirme 00
Veri hacmi 0 Bytes0 Bytes
Tekil görüntülenme 00
Tekil indirme 00

Alıntı yap