• DocumentCode
    409671
  • Title

    Decision combination in speech metadata extraction

  • Author

    Lin, Xiaofan

  • Author_Institution
    Hewlett-Packard Labs., Palo Alto, CA, USA
  • Volume
    1
  • fYear
    2003
  • fDate
    9-12 Nov. 2003
  • Firstpage
    560
  • Abstract
    Speech metadata extraction can both improve speech recognition and enable novel interactive voice response applications. Unlike the previous research, which concentrates on the frame-level signal processing and pattern classification, this paper systematically studies the behavior of decision combination at the utterance level. We analyze the asymptotic characteristics, and the factors affecting frame-level classification. In addition, we introduce new methods to more accurately and efficiently combine frame-level decisions, including phoneme/power-based weighting and smart sampling. Experimental results in gender classification are presented.
  • Keywords
    decision theory; meta data; speech recognition; decision combination; frame-level classification; gender classification; interactive voice response; smart sampling; speech metadata extraction; speech recognition; Automatic speech recognition; Data mining; Laboratories; Loudspeakers; Mel frequency cepstral coefficient; Pattern classification; Signal processing; Signal processing algorithms; Speech processing; Speech recognition;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Signals, Systems and Computers, 2004. Conference Record of the Thirty-Seventh Asilomar Conference on
  • Print_ISBN
    0-7803-8104-1
  • Type

    conf

  • DOI
    10.1109/ACSSC.2003.1291973
  • Filename
    1291973