DocumentCode
409671
Title
Decision combination in speech metadata extraction
Author
Lin, Xiaofan
Author_Institution
Hewlett-Packard Labs., Palo Alto, CA, USA
Volume
1
fYear
2003
fDate
9-12 Nov. 2003
Firstpage
560
Abstract
Speech metadata extraction can both improve speech recognition and enable novel interactive voice response applications. Unlike the previous research, which concentrates on the frame-level signal processing and pattern classification, this paper systematically studies the behavior of decision combination at the utterance level. We analyze the asymptotic characteristics, and the factors affecting frame-level classification. In addition, we introduce new methods to more accurately and efficiently combine frame-level decisions, including phoneme/power-based weighting and smart sampling. Experimental results in gender classification are presented.
Keywords
decision theory; meta data; speech recognition; decision combination; frame-level classification; gender classification; interactive voice response; smart sampling; speech metadata extraction; speech recognition; Automatic speech recognition; Data mining; Laboratories; Loudspeakers; Mel frequency cepstral coefficient; Pattern classification; Signal processing; Signal processing algorithms; Speech processing; Speech recognition;
fLanguage
English
Publisher
ieee
Conference_Titel
Signals, Systems and Computers, 2004. Conference Record of the Thirty-Seventh Asilomar Conference on
Print_ISBN
0-7803-8104-1
Type
conf
DOI
10.1109/ACSSC.2003.1291973
Filename
1291973
Link To Document