Published January 1, 2008 | Version v1
Conference paper Open

Discriminative N-gram Language Modeling for Turkish

  • 1. Bogazici Univ, Dept Elect & Elect Engn, Istanbul, Turkey
  • 2. OGI OHSU, Ctr Spoken Language Understanding, Beaverton, OR USA

Description

In this paper Discriminative Language Models (DLMs) are applied to the Turkish Broadcast News transcription task. Turkish presents a challenge to Automatic Speech Recognition (ASR) systems due to its rich morphology. Therefore, in addition to word n-gram features, morphology based features like root n-grams and inflectional group n-grams are incorporated into DLMs in order to improve the language models. Various feature sets provide reductions in the word error rate (WIER). Our best result is obtained with the inflectional group n-gram features. 1.0% absolute improvement is achieved over the baseline model and this improvement is statistically significant at p<0.001 as measured by the NIST MAPSSWE significance test.

Files

bib-c60b719d-b5db-4c59-ab16-87263749f433.txt

Files (219 Bytes)

Name Size Download all
md5:5e17aef2293c92ae8e98d78a3efbf94c
219 Bytes Preview Download