Lattice Extension and Vocabulary Adaptation for Turkish LVCSR

Arisoy, Ebru; Saraclar, Murat

doi:10.1109/TASL.2008.2006655

Yayınlanmış 1 Ocak 2009 | Sürüm v1

Dergi makalesi Açık

Lattice Extension and Vocabulary Adaptation for Turkish LVCSR

1. Bogazici Univ, Dept Elect & Elect Engn, TR-34342 Istanbul, Turkey

This paper presents two-pass speech recognition techniques to handle the out-of-vocabulary (OOV) problem in Turkish newspaper content transcription. OOV words are assumed to be replaced by acoustically "similar" in-vocabulary (IV) words during decoding. Therefore, the first pass recognition lattice is used as the prior knowledge to adapt the vocabulary and the search space for the second pass. Vocabulary adaptation and lattice extension are performed with words similar to the hypothesis lattice words. These words are selected from a fallback vocabulary using distance functions that take the agglutinative language characteristics of Turkish into account. Morphology-based and phonetic-distance-based similarity functions respectively yield 1.9% and 4.6% absolute accuracy improvements. Statistical sub-word units are also utilized to handle the OOV problem encountered in the word-based system. Using sub-words alleviates the OOV problem and improves the recognition accuracy-OOV accuracy improved from 0% to 60.2%. However, this introduces ungrammatical items to the recognition output. Since automatically derived sub-word units do not provide explicit morphological features, the lattice extension strategy is modified to correct these ungrammatical items. Lattice extension for sub-words reduces the word error rate to 32.3% from 33.9%. This improvement is statistically significant at p = 0.002 as measured by the NIST MAPSSWE significance test.

Dosyalar

bib-82291011-f6c8-49e5-9e6a-284acab466ee.txt

Dosyalar (171 Bytes)

Ad	Boyut	Hepisini indir
bib-82291011-f6c8-49e5-9e6a-284acab466ee.txt md5:eb26f76a64a39249b9ac6a45b0d0d9de	171 Bytes	Ön İzleme İndir

	Tüm sürümler	Bu sürüm
Görüntüleme	43	43
İndirilenler	20	20
Veri miktarı	3.6 kB	3.6 kB

Lattice Extension and Vocabulary Adaptation for Turkish LVCSR

Dosyalar

bib-82291011-f6c8-49e5-9e6a-284acab466ee.txt

Dosyalar (171 Bytes)

TÜBİTAK ULAKBİM

İLETİŞİM

Lattice Extension and Vocabulary Adaptation for Turkish LVCSR

Oluşturanlar

Açıklama

Dosyalar

bib-82291011-f6c8-49e5-9e6a-284acab466ee.txt

Dosyalar (171 Bytes)