Veri seti Açık Erişim

TWiST: Turkish-English Wikipedia & Thesis STEM Terminology Dataset

Gebeşçe, Ali; Gül Şahin, Gözde; Amasya, Ege Uğur


Dublin Core

<?xml version='1.0' encoding='utf-8'?>
<oai_dc:dc xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd">
  <dc:creator>Gebeşçe, Ali</dc:creator>
  <dc:creator>Gül Şahin, Gözde</dc:creator>
  <dc:creator>Amasya, Ege Uğur</dc:creator>
  <dc:date>2025-06-25</dc:date>
  <dc:description>We introduce TWiST—the Turkish-English Wikipedia &amp; Thesis STEM Terminology Dataset—an expertly curated, sentence-aligned bilingual resource that addresses a key gap in Turkish computational linguistics. TWiST is a 3,300-sentence Turkish–English parallel corpus of STEM terminology drawn from two sources: 1,185 sentences from Wikimedia Content Translation dump and 2,115 sentences from 287 graduate-thesis abstracts at Turkey’s top six universities. Focused on Mathematics, Physics, and Computer Science, every sentence pair was triple-annotated by 43 trained bilingual annotators following a 30-page guideline, achieving substantial agreement (Fleiss κ ≈ 0.7). TWiST ultimately captures 10,157 annotated term instances covering 1,223 distinct English technical terms, offering a high-quality benchmark for bilingual terminology extraction, translation consistency, and terminology-aware NLP.</dc:description>
  <dc:identifier>https://aperta.ulakbim.gov.trrecord/286016</dc:identifier>
  <dc:identifier>10.48623/aperta.286016</dc:identifier>
  <dc:identifier>oai:aperta.ulakbim.gov.tr:286016</dc:identifier>
  <dc:language>eng</dc:language>
  <dc:rights>info:eu-repo/semantics/openAccess</dc:rights>
  <dc:rights>https://creativecommons.org/licenses/by-nc/4.0/</dc:rights>
  <dc:title>TWiST: Turkish-English Wikipedia &amp; Thesis STEM Terminology Dataset</dc:title>
  <dc:type>info:eu-repo/semantics/other</dc:type>
  <dc:type>dataset</dc:type>
</oai_dc:dc>
0
0
görüntülenme
indirilme
Tüm sürümler Bu sürüm
Görüntülenme 00
İndirme 00
Veri hacmi 0 Bytes0 Bytes
Tekil görüntülenme 00
Tekil indirme 00

Alıntı yap