Konferans bildirisi Açık Erişim

Document Embedding based Supervised Methods for Turkish Text Classification

   Celenli, Halil I.; Ozturk, S. Talha; Sahin, Gurkan; Gerek, Aydin; Ganiz, Murat C.

Following the recent increase in the amount of available data, Deep Learning has become the most popular branch of Machine Learning. This trend can also be seen in Natural Language Processing (NLP) especially since textual data can now be scraped from in World Wide Web in vast quantities and used in an unsupervised or semi-supervised manner. For this reason, Deep Learning methods are being used more frequently. In this work we devise several classification methods based on the Paragraph Vector model (a.k.a. Doc2Vec) which represents documents as vectors. These include k-Nearest Neighborhood classifier (k-NN), Support Vector Machines (SVM), Centroid Classifier (CC) that works on paragraph vectors of documents and a custom made method which uses pairwise cosine similarities between documents and class centroids as features in Doc2Vec space. Our experiments use a number of representations and classifiers combined in various ways. On the representation side the Paragraph Vector model is compared with Term Frequency (tf) and Term Frequency-Inverse Document Frequency (tf-idf) using SVM, k-NN, CC and Centroid Features Support Vector Machine (CFSVM) as classifiers.

Dosyalar (219 Bytes)
Dosya adı Boyutu
bib-7a7a136c-f514-49ce-bd32-3e629a5722e6.txt
md5:50c570bfc2bb4d9a82df7f8dbb1457b8
219 Bytes İndir
19
7
görüntülenme
indirilme
Görüntülenme 19
İndirme 7
Veri hacmi 1.5 kB
Tekil görüntülenme 16
Tekil indirme 7

Alıntı yap