Veri seti Açık Erişim

TWiST: Turkish-English Wikipedia & Thesis STEM Terminology Dataset

Gebeşçe, Ali; Gül Şahin, Gözde; Amasya, Ege Uğur


JSON-LD (schema.org)

{
  "@context": "https://schema.org/", 
  "@id": "https://doi.org/10.48623/aperta.286016", 
  "@type": "Dataset", 
  "creator": [
    {
      "@id": "https://orcid.org/0000-0002-7997-0557", 
      "@type": "Person", 
      "affiliation": "Ko\u00e7 \u00dcniversitesi", 
      "name": "Gebe\u015f\u00e7e, Ali"
    }, 
    {
      "@id": "https://orcid.org/0000-0002-0332-1657", 
      "@type": "Person", 
      "affiliation": "Ko\u00e7 \u00dcniversitesi", 
      "name": "G\u00fcl \u015eahin, G\u00f6zde"
    }, 
    {
      "@type": "Person", 
      "affiliation": "Ko\u00e7 \u00dcniversitesi", 
      "name": "Amasya, Ege U\u011fur"
    }
  ], 
  "datePublished": "2025-06-25", 
  "description": "<p>We introduce <em><strong>TWiST</strong></em>&mdash;the Turkish-English Wikipedia &amp; Thesis STEM Terminology Dataset&mdash;an expertly curated, sentence-aligned bilingual resource that addresses a key gap in Turkish computational linguistics. <em>TWiST</em> is a 3,300-sentence Turkish&ndash;English parallel corpus of STEM terminology drawn from two sources: 1,185 sentences from Wikimedia Content Translation dump and 2,115 sentences from 287 graduate-thesis abstracts at Turkey&rsquo;s top six universities. Focused on Mathematics, Physics, and Computer Science, every sentence pair was triple-annotated by 43 trained bilingual annotators following a 30-page guideline, achieving substantial agreement (Fleiss &kappa; &asymp; 0.7). <em>TWiST</em> ultimately captures 10,157 annotated term instances covering 1,223 distinct English technical terms, offering a high-quality benchmark for bilingual terminology extraction, translation consistency, and terminology-aware NLP.</p>", 
  "distribution": [
    {
      "@type": "DataDownload", 
      "contentUrl": "https://aperta.ulakbim.gov.tr/api/files/d985fe9d-ba91-4a3a-95bd-284bd2e58505/dataset.json", 
      "fileFormat": "json"
    }
  ], 
  "identifier": "https://doi.org/10.48623/aperta.286016", 
  "inLanguage": {
    "@type": "Language", 
    "alternateName": "eng", 
    "name": "English"
  }, 
  "keywords": [], 
  "license": "https://creativecommons.org/licenses/by-nc/4.0/", 
  "name": "TWiST: Turkish-English Wikipedia & Thesis STEM Terminology Dataset", 
  "url": "https://aperta.ulakbim.gov.tr/record/286016", 
  "version": "version_1"
}
0
0
görüntülenme
indirilme
Tüm sürümler Bu sürüm
Görüntülenme 00
İndirme 00
Veri hacmi 0 Bytes0 Bytes
Tekil görüntülenme 00
Tekil indirme 00

Alıntı yap