Published January 1, 2007
| Version v1
Conference paper
Open
Morphological disambiguation of Turkish text with perceptron algorithm
Creators
- 1. Bogazici Univ, Dept Comp Engn, TR-34342 Istanbul, Turkey
- 2. Bogazici Univ, Dept Elect & Elect Engn, TR-34342 Istanbul, Turkey
Description
This paper describes the application of the perceptron algorithm to the morphological disambiguation of Turkish text. Turkish has a productive derivational morphology. Due to the ambiguity caused by complex morphology, a word may have multiple morphological parses, each with a different stem or sequence of morphemes. The methodology employed is based on ranking with perceptron algorithm which has been successful in some NLP tasks in English. We use a baseline statistical trigram-based model of a previous work to enumerate an n-best list of candidate morphological parse sequences for each sentence. We then apply the perceptron algorithm to rerank the n-best list using a set of 23 features. The perceptron trained to do morphological disambiguation improves the accuracy of the baseline model from 93.61% to 96.80%. When we train the perceptron as a POS tagger, the accuracy is 98.27%. Turkish morphological disambiguation and POS tagging results that we obtained is the best reported so far.
Files
bib-b38fd96b-4ad5-4206-8bc8-f2b7dd1cb82c.txt
Files
(172 Bytes)
| Name | Size | Download all |
|---|---|---|
|
md5:0ae414fa2eeb70542c5fdb28def01682
|
172 Bytes | Preview Download |