• DocumentCode
    2406901
  • Title

    Feature normalization and selection for robust speaker state recognition

  • Author

    Huang, Chien-Lin ; Tsao, Yu ; Hori, Chiori ; Kashioka, Hideki

  • Author_Institution
    Spoken Language Commun. Group, Nat. Inst. of Inf. & Commun. Technol., Kyoto, Japan
  • fYear
    2011
  • fDate
    26-28 Oct. 2011
  • Firstpage
    102
  • Lastpage
    105
  • Abstract
    In this paper, we propose an integration process of feature compensation and selection on the collective acoustic feature sets to derive a set of advanced acoustic features for speaker state recognition. For feature normalization, we perform a two-dimensional histogram equalization (2-D HEQ) normalization to reduce variability of speaker and speaking environment factors. For feature selection, we apply a principal component analysis (PCA)-based feature selection to extract meaningful parameters from the original acoustic feature sets and to eliminate redundant components. We conducted experiments on Alcohol Language Corpus (ALC) and Sleepy Language Corpus (SLC) provided in INTERSPEECH 2011 Speaker State Challenge. The openSMILE toolkit is used to extract acoustic features of low-level-descriptors and their related functionals. Experimental results show that the derived acoustic feature set, processed by 2-D HEQ normalization and PCA-based selection, gives improvements over the original feature sets. The results verify that the derived acoustic feature set is a discriminative and compact representation that efficiently exploits multiple knowledge sources from the ensemble acoustic feature sets.
  • Keywords
    acoustic signal processing; feature extraction; principal component analysis; speaker recognition; 2D HEQ; 2D histogram equalization normalization; INTERSPEECH 2011 Speaker State Challenge; PCA-based selection; alcohol language corpus; collective acoustic feature sets; feature normalization; feature selection; openSMILE toolkit; principal component analysis; redundant component elimination; robust speaker state recognition; sleepy language corpus; speaker variability reduction; speaking environment factors; Accuracy; Acoustics; Feature extraction; Sleep; Speech; Speech recognition; Vectors;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Speech Database and Assessments (Oriental COCOSDA), 2011 International Conference on
  • Conference_Location
    Hsinchu
  • Print_ISBN
    978-1-4577-0930-2
  • Type

    conf

  • DOI
    10.1109/ICSDA.2011.6085988
  • Filename
    6085988