Konferans bildirisi Açık Erişim

SPEECH DETECTION ON BROADCAST AUDIO

   Zubari, Unal; Ozan, Ezgi Can; Acar, Banu Oskay; Ciloglu, Tolga; Esen, Ersin; Ates, Tugrul K.; Onur, Duygu Oskay

Speech boundary detection contributes to performance of speech based applications such as speech recognition and speaker recognition. Speech boundary detector implemented in this study works on broadcast audio as a pre-processor module of a keyword spotter. Speech boundary detection is handled in 3 steps. At first step, audio data is segmented into homogeneous regions in an unsupervised manner. After an ACTIVITY/NON-ACTIVITY decision is made for each region, ACTIVITY regions are classified as Speech/Non-speech via Gaussian Mixture Model (GMM) based classification. GMM's are trained using a novel feature, Spectral Flow Direction (SFD), and an improved multi-band harmonicity feature in addition to widely used Mel Frequency Cepstral Coefficients (MFCC's).

Dosyalar (177 Bytes)
Dosya adı Boyutu
bib-a2d37fe5-c5aa-4581-9110-d4f29c74467e.txt
md5:8b97a08e268a47179dc0ea9d801acb76
177 Bytes İndir
29
10
görüntülenme
indirilme
Görüntülenme 29
İndirme 10
Veri hacmi 1.8 kB
Tekil görüntülenme 28
Tekil indirme 10

Alıntı yap