Turkish Normalization Lexicon for Social Media

Demir, Seniz; Tan, Murat; Topcu, Berkay

doi:10.1007/978-3-319-75487-1_33

Published January 1, 2018 | Version v1

Conference paper Open

Turkish Normalization Lexicon for Social Media

1. TUBITAK BILGEM, Kocaeli, Turkey

Social media has its own evergrowing language and distinct characteristics. Although social media is shown to be of great utility to research studies, varying quality of written texts degrades the performance of existing NLP tools. Normalization of texts, transforming from informal to well-written texts, appears to be a reasonable preprocessing step to adapt tools trained on different domains to social media. In this study, we compile the first Turkish normalization lexicon that sheds light to the kinds of observed lexical variations in social media texts. A graphical representation acquired from a text corpus is used to model contextual similarities between normalization equivalences and the lexicon is automatically generated by performing random walks on this graph. The underlying framework not only enables different lexicons to be generated from the same corpus but also produces lexicons that are tuned to specific genres. Evaluation studies demonstrated the effectiveness of induced lexicon in normalizing Turkish texts.

Files

bib-e0f90470-d9a6-4b96-95f4-93f70345fddf.txt

Files (167 Bytes)

Name	Size	Download all
bib-e0f90470-d9a6-4b96-95f4-93f70345fddf.txt md5:d447b21c37d096101993d6fa25d71749	167 Bytes	Preview Download

	All versions	This version
Views	154	154
Downloads	83	83
Data volume	13.9 kB	13.9 kB

Turkish Normalization Lexicon for Social Media

Files

bib-e0f90470-d9a6-4b96-95f4-93f70345fddf.txt

Files (167 Bytes)

TÜBİTAK ULAKBİM

CONTACT

Turkish Normalization Lexicon for Social Media

Creators

Description

Files

bib-e0f90470-d9a6-4b96-95f4-93f70345fddf.txt

Files (167 Bytes)