Published January 1, 2018 | Version v1
Conference paper Open

Building Morphological Chains for Agglutinative Languages

  • 1. Middle East Tech Univ ODTU, Dept Comp Engn, TR-06800 Ankara, Turkey
  • 2. Hacettepe Univ Beytepe, Dept Comp Engn, TR-06800 Ankara, Turkey

Description

In this paper, we build morphological chains for agglutinative languages by using a log linear model for the morphological segmentation task. The model is based on the unsupervised morphological segmentation system called MorphoChains [1]. We extend MorphoChains log linear model by expanding the candidate space recursively to cover more split points for agglutinative languages such as Turkish, whereas in the original model candidates are generated by considering only binary segmentation of each word. The results show that we improve the state-of-art Turkish scores by 12% having a F-measure of 72% and we improve the English scores by 3% having a F-measure of 74%. Eventually, the system outperforms both MorphoChains and other well-known unsupervised morphological segmentation systems. The results indicate that candidate generation plays an important role in such an unsupervised loglinear model that is learned using contrastive estimation with negative samples.

Files

bib-fc413585-b98e-4c2a-acfa-1a600438ac6f.txt

Files (164 Bytes)

Name Size Download all
md5:1969a3f3ae343fe049ec664b8be835f2
164 Bytes Preview Download