DocumentCode :
409671
Title :
Decision combination in speech metadata extraction
Author :
Lin, Xiaofan
Author_Institution :
Hewlett-Packard Labs., Palo Alto, CA, USA
Volume :
1
fYear :
2003
fDate :
9-12 Nov. 2003
Firstpage :
560
Abstract :
Speech metadata extraction can both improve speech recognition and enable novel interactive voice response applications. Unlike the previous research, which concentrates on the frame-level signal processing and pattern classification, this paper systematically studies the behavior of decision combination at the utterance level. We analyze the asymptotic characteristics, and the factors affecting frame-level classification. In addition, we introduce new methods to more accurately and efficiently combine frame-level decisions, including phoneme/power-based weighting and smart sampling. Experimental results in gender classification are presented.
Keywords :
decision theory; meta data; speech recognition; decision combination; frame-level classification; gender classification; interactive voice response; smart sampling; speech metadata extraction; speech recognition; Automatic speech recognition; Data mining; Laboratories; Loudspeakers; Mel frequency cepstral coefficient; Pattern classification; Signal processing; Signal processing algorithms; Speech processing; Speech recognition;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Signals, Systems and Computers, 2004. Conference Record of the Thirty-Seventh Asilomar Conference on
Print_ISBN :
0-7803-8104-1
Type :
conf
DOI :
10.1109/ACSSC.2003.1291973
Filename :
1291973
Link To Document :
بازگشت