Dergi makalesi Açık Erişim

Model selection and score normalization for text-dependent single utterance speaker verification

Buyuk, Osman; Arslan, Mustafa Levent

DataCite XML

<?xml version='1.0' encoding='utf-8'?>
<resource xmlns:xsi="" xmlns="" xsi:schemaLocation="">
  <identifier identifierType="URL"></identifier>
      <creatorName>Buyuk, Osman</creatorName>
      <affiliation>Bogazici Univ, Dept Elect &amp; Elect Engn, Istanbul, Turkey</affiliation>
      <creatorName>Arslan, Mustafa Levent</creatorName>
      <givenName>Mustafa Levent</givenName>
    <title>Model Selection And Score Normalization For Text-Dependent Single Utterance Speaker Verification</title>
    <date dateType="Issued">2012-01-01</date>
  <resourceType resourceTypeGeneral="Text">Journal article</resourceType>
    <alternateIdentifier alternateIdentifierType="url"></alternateIdentifier>
    <relatedIdentifier relatedIdentifierType="DOI" relationType="IsIdenticalTo">10.3906/elk-1103-35</relatedIdentifier>
    <rights rightsURI="">Creative Commons Attribution</rights>
    <rights rightsURI="info:eu-repo/semantics/openAccess">Open Access</rights>
    <description descriptionType="Abstract">In this paper, we investigate model selection and channel variability issues on a text-dependent single utterance (TDSU) speaker verification application. Due to the lack of an appropriate database for the task, a multichannel speaker recognition database, which consists of multiple recordings of a single Turkish utterance, is collected. The first set of experiments is devoted to model selection. Phonetic hidden Markov model (HMM)-based, sentence HMM-based, and Gaussian mixture model (GMM)-based methods are compared to find the most appropriate modeling approach for the target application. Based on the experimental results, the HMM-based methods outperform the GMM. The sentence HMM yields the best performance among the 3 approaches. In the second set of experiments, we implement various score normalization techniques in order to compensate for channel mismatch conditions. Test normalization, zero normalization, and their combinations are investigated for the TDSU task. We propose a novel combination procedure named combined normalization (C-norm). We also benefit from prior knowledge of the handset-channel type in order to improve the verification performance. A cohort-based channel detection procedure is presented to identify enrollment/authentication channels in addition to the GMM-based method. In score normalization, handset-dependent C-norm results in the best performance, with a 0.72% equal error rate (EER) in the ideal channel known case and a 0.74% EER when the GMM and cohort-based systems are combined together for channel detection.</description>
Görüntülenme 33
İndirme 7
Veri hacmi 1.4 kB
Tekil görüntülenme 30
Tekil indirme 6


Alıntı yap