Veri seti Açık Erişim

TWiST: Turkish-English Wikipedia & Thesis STEM Terminology Dataset

Gebeşçe, Ali; Gül Şahin, Gözde; Amasya, Ege Uğur


Citation Style Language JSON

{
  "DOI": "10.48623/aperta.286015", 
  "abstract": "<p>We introduce <em><strong>TWiST</strong></em>&mdash;the Turkish-English Wikipedia &amp; Thesis STEM Terminology Dataset&mdash;an expertly curated, sentence-aligned bilingual resource that addresses a key gap in Turkish computational linguistics. <em>TWiST</em> is a 3,300-sentence Turkish&ndash;English parallel corpus of STEM terminology drawn from two sources: 1,185 sentences from Wikimedia Content Translation dump and 2,115 sentences from 287 graduate-thesis abstracts at Turkey&rsquo;s top six universities. Focused on Mathematics, Physics, and Computer Science, every sentence pair was triple-annotated by 43 trained bilingual annotators following a 30-page guideline, achieving substantial agreement (Fleiss &kappa; &asymp; 0.7). <em>TWiST</em> ultimately captures 10,157 annotated term instances covering 1,223 distinct English technical terms, offering a high-quality benchmark for bilingual terminology extraction, translation consistency, and terminology-aware NLP.</p>", 
  "author": [
    {
      "family": "Gebe\u015f\u00e7e", 
      "given": " Ali"
    }, 
    {
      "family": "G\u00fcl \u015eahin", 
      "given": " G\u00f6zde"
    }, 
    {
      "family": "Amasya", 
      "given": " Ege U\u011fur"
    }
  ], 
  "id": "286016", 
  "issued": {
    "date-parts": [
      [
        2025, 
        6, 
        25
      ]
    ]
  }, 
  "language": "eng", 
  "title": "TWiST: Turkish-English Wikipedia & Thesis STEM Terminology Dataset", 
  "type": "dataset", 
  "version": "version_1"
}
0
0
görüntülenme
indirilme
Tüm sürümler Bu sürüm
Görüntülenme 00
İndirme 00
Veri hacmi 0 Bytes0 Bytes
Tekil görüntülenme 00
Tekil indirme 00

Alıntı yap