Konferans bildirisi Açık Erişim
Buyuk, Osman; Arslan, Levent
In this paper, we investigate the use of deep neural networks (DNN) for a multi-language age classification task using speaker's voice. For this purpose, speech databases in two different languages are combined together to construct a multi-language database. Mel-frequency cepstral coefficients (MFCC) are extracted for each utterance. A Gaussian mixture model (GMM), a support vector machine (SVM) and a feed-forward deep neural network (DNN) systems are trained using the features. In the SVM and DNN methods, the GMM means are concatenated to obtain a GMM supervector. The supervectors are fed into the SVM and DNN for age classification. In the experiments, we observe that the multi-language training does not degrade the performance in the SVM and DNN methods when compared to the matched training where train and test languages are the same. On the other hand, the performance is degraded for the traditional GMM method. Additionally, the SVM and DNN significantly outperform the GMM in the multi-language train-test scenario. The absolute performance improvement with the SVM and DNN is approximately 12% and 7% for female and male speakers, respectively.
Dosya adı | Boyutu | |
---|---|---|
bib-bdf4f798-9382-49c3-99fa-176657d3e8b9.txt
md5:205efd38f8913351eb41a491343523f4 |
233 Bytes | İndir |
Görüntülenme | 41 |
İndirme | 11 |
Veri hacmi | 2.6 kB |
Tekil görüntülenme | 36 |
Tekil indirme | 11 |