Turkish Broadcast News Transcription and Retrieval

Arisoy, Ebru; Can, Dogan; Parlak, Siddika; Sak, Hasim; Saraclar, Murat

doi:10.1109/TASL.2008.2012313

Yayınlanmış 1 Ocak 2009 | Sürüm v1

Dergi makalesi Açık

Turkish Broadcast News Transcription and Retrieval

1. Bogazici Univ, Dept Elect & Elect Engn, TR-34342 Istanbul, Turkey
2. Bogazici Univ, Dept Comp Engn, TR-34342 Istanbul, Turkey

This paper summarizes our recent efforts for building a Turkish Broadcast News transcription and retrieval system. The agglutinative nature of Turkish leads to a high number of out-of-vocabulary (OOV) words which in turn lower automatic speech recognition (ASR) accuracy. This situation compromises the performance of speech retrieval systems based on ASR output. Therefore using a word-based ASR is not adequate for transcribing speech in Turkish. To alleviate this problem, various sub-word-based recognition units are utilized. These units solve the OOV problem with moderate size vocabularies and perform even better than a 500 K word vocabulary as far as recognition accuracy is concerned. As a novel approach, the interaction between recognition units, words and sub-words, and discriminative training is explored. Sub-word models benefit from discriminative training more than word models do, especially in the discriminative language modeling framework. For speech retrieval, a spoken term detection system based on automata indexation is utilized. As with transcription, retrieval performance is measured under various schemes incorporating words and sub-words. Best results are obtained using a cascade of word and sub-word indexes together with term-specific thresholding.

Dosyalar

bib-9ca55a80-a471-42ef-a00f-3dc72fe1f708.txt

Dosyalar (190 Bytes)

Ad	Boyut	Hepisini indir
bib-9ca55a80-a471-42ef-a00f-3dc72fe1f708.txt md5:0fcacb66cf67e8a7475aff8612c8fbac	190 Bytes	Ön İzleme İndir

	Tüm sürümler	Bu sürüm
Görüntüleme	105	105
İndirilenler	22	22
Veri miktarı	4.4 kB	4.4 kB

Turkish Broadcast News Transcription and Retrieval

Dosyalar

bib-9ca55a80-a471-42ef-a00f-3dc72fe1f708.txt

Dosyalar (190 Bytes)

TÜBİTAK ULAKBİM

İLETİŞİM

Turkish Broadcast News Transcription and Retrieval

Oluşturanlar

Açıklama

Dosyalar

bib-9ca55a80-a471-42ef-a00f-3dc72fe1f708.txt

Dosyalar (190 Bytes)