Yayınlanmış 1 Ocak 2012 | Sürüm v1
Dergi makalesi Açık

Morpholexical and Discriminative Language Models for Turkish Automatic Speech Recognition

  • 1. Bogazici Univ, Dept Comp Engn, TR-34342 Istanbul, Turkey
  • 2. Bogazici Univ, Dept Elect & Elect Engn, TR-34342 Istanbul, Turkey

Açıklama

This paper introduces two complementary language modeling approaches for morphologically rich languages aiming to alleviate out-of-vocabulary (OOV) word problem and to exploit morphology as a knowledge source. The first model, morpholexical language model, is a generative n-gram model, where modeling units are lexical-grammatical morphemes instead of commonly used words or statistical sub-words. This paper also proposes a novel approach for integrating the morphology into an automatic speech recognition (ASR) system in the finite-state transducer framework as a knowledge source. We accomplish that by building a morpholexical search network obtained by the composition of lexical transducer of a computational lexicon with a morpholexical language model. The second model is a linear reranking model trained discriminatively with a variant of the perceptron algorithm using morpholexical features. This variant of the perceptron algorithm, WER-sensitive perceptron, is shown to perform better for reranking n-best candidates obtained with the generative model. We apply the proposed models in Turkish broadcast news transcription task and give experimental results. The morpholexical model leads to an elegant morphology-integrated search network with unlimited vocabulary. Thus, it is highly effective in alleviating OOV problem and improves the word error rate (WER) over word and statistical sub-word models by 1.8% and 0.4% absolute, respectively. The discriminatively trained morpholexical model further improves the WER of the system by 0.8% absolute.

Dosyalar

bib-8041c824-ce5b-4a95-9266-1507234c1ae1.txt

Dosyalar (210 Bytes)

Ad Boyut Hepisini indir
md5:73fba1aa848f4df9b4b1c4b416564a81
210 Bytes Ön İzleme İndir