Veri seti Açık Erişim

TWiST: Turkish-English Wikipedia & Thesis STEM Terminology Dataset

Gebeşçe, Ali; Gül Şahin, Gözde; Amasya, Ege Uğur


JSON

{
  "conceptdoi": "10.48623/aperta.286015", 
  "conceptrecid": "286015", 
  "created": "2025-06-25T10:14:11.701497+00:00", 
  "doi": "10.48623/aperta.286016", 
  "files": [
    {
      "bucket": "d985fe9d-ba91-4a3a-95bd-284bd2e58505", 
      "checksum": "md5:65c835744f5cccc043b06e7eb713f522", 
      "key": "dataset.json", 
      "links": {
        "self": "https://aperta.ulakbim.gov.tr/api/files/d985fe9d-ba91-4a3a-95bd-284bd2e58505/dataset.json"
      }, 
      "size": 13867478, 
      "type": "json"
    }
  ], 
  "id": 286016, 
  "links": {
    "badge": "https://aperta.ulakbim.gov.tr/badge/doi/10.48623/aperta.286016.svg", 
    "bucket": "https://aperta.ulakbim.gov.tr/api/files/d985fe9d-ba91-4a3a-95bd-284bd2e58505", 
    "conceptbadge": "https://aperta.ulakbim.gov.tr/badge/doi/10.48623/aperta.286015.svg", 
    "conceptdoi": "https://doi.org/10.48623/aperta.286015", 
    "doi": "https://doi.org/10.48623/aperta.286016", 
    "html": "https://aperta.ulakbim.gov.tr/record/286016", 
    "latest": "https://aperta.ulakbim.gov.tr/api/records/286016", 
    "latest_html": "https://aperta.ulakbim.gov.tr/record/286016"
  }, 
  "metadata": {
    "access_right": "open", 
    "access_right_category": "success", 
    "creators": [
      {
        "affiliation": "Ko\u00e7 \u00dcniversitesi", 
        "name": "Gebe\u015f\u00e7e, Ali", 
        "orcid": "0000-0002-7997-0557"
      }, 
      {
        "affiliation": "Ko\u00e7 \u00dcniversitesi", 
        "name": "G\u00fcl \u015eahin, G\u00f6zde", 
        "orcid": "0000-0002-0332-1657"
      }, 
      {
        "affiliation": "Ko\u00e7 \u00dcniversitesi", 
        "name": "Amasya, Ege U\u011fur"
      }
    ], 
    "description": "<p>We introduce <em><strong>TWiST</strong></em>&mdash;the Turkish-English Wikipedia &amp; Thesis STEM Terminology Dataset&mdash;an expertly curated, sentence-aligned bilingual resource that addresses a key gap in Turkish computational linguistics. <em>TWiST</em> is a 3,300-sentence Turkish&ndash;English parallel corpus of STEM terminology drawn from two sources: 1,185 sentences from Wikimedia Content Translation dump and 2,115 sentences from 287 graduate-thesis abstracts at Turkey&rsquo;s top six universities. Focused on Mathematics, Physics, and Computer Science, every sentence pair was triple-annotated by 43 trained bilingual annotators following a 30-page guideline, achieving substantial agreement (Fleiss &kappa; &asymp; 0.7). <em>TWiST</em> ultimately captures 10,157 annotated term instances covering 1,223 distinct English technical terms, offering a high-quality benchmark for bilingual terminology extraction, translation consistency, and terminology-aware NLP.</p>", 
    "doi": "10.48623/aperta.286016", 
    "has_grant": false, 
    "keywords": [], 
    "language": "eng", 
    "license": {
      "id": "cc-by-nc-4.0"
    }, 
    "publication_date": "2025-06-25", 
    "related_identifiers": [
      {
        "identifier": "10.48623/aperta.286015", 
        "relation": "isVersionOf", 
        "scheme": "doi"
      }
    ], 
    "relations": {
      "version": [
        {
          "count": 1, 
          "index": 0, 
          "is_last": true, 
          "last_child": {
            "pid_type": "recid", 
            "pid_value": "286016"
          }, 
          "parent": {
            "pid_type": "recid", 
            "pid_value": "286015"
          }
        }
      ]
    }, 
    "resource_type": {
      "title": "Veri seti", 
      "type": "dataset"
    }, 
    "science_branches": [
      "Teknik Bilimler > Bilgisayar Bilimleri > Yapay Zeka, Bilgisayarda \u00d6\u011frenme ve \u00d6r\u00fcnt\u00fc Tan\u0131ma > Do\u011fal Dil \u0130\u015flemesi"
    ], 
    "title": "TWiST: Turkish-English Wikipedia & Thesis STEM Terminology Dataset", 
    "version": "version_1"
  }, 
  "owners": [
    2871
  ], 
  "revision": 1, 
  "stats": {
    "downloads": 0.0, 
    "unique_downloads": 0.0, 
    "unique_views": 0.0, 
    "version_downloads": 0.0, 
    "version_unique_downloads": 0.0, 
    "version_unique_views": 0.0, 
    "version_views": 0.0, 
    "version_volume": 0.0, 
    "views": 0.0, 
    "volume": 0.0
  }, 
  "updated": "2025-06-25T10:14:11.768882+00:00"
}
0
0
görüntülenme
indirilme
Tüm sürümler Bu sürüm
Görüntülenme 00
İndirme 00
Veri hacmi 0 Bytes0 Bytes
Tekil görüntülenme 00
Tekil indirme 00

Alıntı yap