Published January 1, 2012
| Version v1
Journal article
Open
Morpholexical and Discriminative Language Models for Turkish Automatic Speech Recognition
Creators
- 1. Bogazici Univ, Dept Comp Engn, TR-34342 Istanbul, Turkey
- 2. Bogazici Univ, Dept Elect & Elect Engn, TR-34342 Istanbul, Turkey
Description
This paper introduces two complementary language modeling approaches for morphologically rich languages aiming to alleviate out-of-vocabulary (OOV) word problem and to exploit morphology as a knowledge source. The first model, morpholexical language model, is a generative n-gram model, where modeling units are lexical-grammatical morphemes instead of commonly used words or statistical sub-words. This paper also proposes a novel approach for integrating the morphology into an automatic speech recognition (ASR) system in the finite-state transducer framework as a knowledge source. We accomplish that by building a morpholexical search network obtained by the composition of lexical transducer of a computational lexicon with a morpholexical language model. The second model is a linear reranking model trained discriminatively with a variant of the perceptron algorithm using morpholexical features. This variant of the perceptron algorithm, WER-sensitive perceptron, is shown to perform better for reranking n-best candidates obtained with the generative model. We apply the proposed models in Turkish broadcast news transcription task and give experimental results. The morpholexical model leads to an elegant morphology-integrated search network with unlimited vocabulary. Thus, it is highly effective in alleviating OOV problem and improves the word error rate (WER) over word and statistical sub-word models by 1.8% and 0.4% absolute, respectively. The discriminatively trained morpholexical model further improves the WER of the system by 0.8% absolute.
Files
bib-8041c824-ce5b-4a95-9266-1507234c1ae1.txt
Files
(210 Bytes)
| Name | Size | Download all |
|---|---|---|
|
md5:73fba1aa848f4df9b4b1c4b416564a81
|
210 Bytes | Preview Download |