Published January 1, 2009 | Version v1
Conference paper Open

Integrating Morphology into Automatic Speech Recognition

  • 1. Bogazici Univ, Dept Comp Engn, TR-34342 Istanbul, Turkey
  • 2. Bogazici Univ, Dept Elect & Elect Engn, TR-34342 Bebek, Turkey

Description

This paper proposes a novel approach to integrate the morphology as a model into an automatic speech recognition (ASR) system for morphologically rich languages. The high out-of-vocabulary (OOV) word rates have been a major challenge for ASR in morphologically productive languages. The standard approach to this problem has been to shift from words to sub-word units in language modeling, and the only change to the system is in the language model estimated over these units. In contrast, we propose to integrate the morphology as other any knowledge source - such as the lexicon, and the language model - directly into the search network. The morphological parser for a language, implemented as a finite-state lexical transducer, can be considered as a computational lexicon. The computational lexicon represents a dynamic vocabulary in contrast to a static vocabulary generally used for ASR. We compose the transducer for this computational lexicon with a statistical language model over lexical morphemes to obtain a morphology-integrated search network. The resulting search network generates only grammatical word forms and improves the recognition accuracy due to reduced OOV rate. We give experimental results for Turkish broadcast news transcription, and show that it outperforms the 50K and 100K vocabulary word models while the 200K vocabulary word model is slightly better.

Files

bib-95a6d3d1-83d9-4b59-bffe-15824112b713.txt

Files (179 Bytes)

Name Size Download all
md5:e9c3117ed1a58f88766b422baf93054a
179 Bytes Preview Download