DocumentCode :
2875469
Title :
Voice-to-phoneme conversion algorithms for speaker-independent voice-tag applications in embedded platforms
Author :
Cheng, Yan Ming ; Ma, Changxue ; Melnar, Lynette
Author_Institution :
Center for Human Interaction Res., Motorola Labs., Schaumburg, IL
fYear :
2005
fDate :
27-27 Nov. 2005
Firstpage :
403
Lastpage :
408
Abstract :
In this paper we present two voice-to-phoneme conversion algorithms that extract voice-tag abstractions for speaker-independent voice-tag applications in embedded platforms, which are very sensitive to memory and CPU consumptions. In the first approach, a voice-to-phoneme conversion in batch mode manages this task by preserving the commonality of input feature vectors of multiple voice-tag example utterances. Given multiple example utterances, a developed feature combination strategy produces an "average" utterance, which is converted to phonetic strings as a voice-tag representation via a speaker-independent phonetic decoder. In the second approach, a sequential voice-to-phoneme conversion algorithm uncovers the hierarchy of phonetic consensus embedded among multiple phonetic hypotheses generated by a speaker-independent phonetic decoder from multiple example utterances of a voice-tag. The most relevant phonetic hypotheses are then chosen to represent the voice-tag. The voice-tag representations obtained by these two voice-to-phoneme conversion algorithms are compared in speech recognition experiments to phonetic transcriptions of voice-tag reference prepared by an expert phonetician. Both algorithms either perform comparably to or significantly better than the manual transcription approach. We conclude from this that both algorithms are very effective for the targeted purposes
Keywords :
speaker recognition; speech processing; embedded platforms; phonetic strings; speaker-independent voice-tag applications; speech recognition experiments; voice-tag abstractions; voice-to-phoneme conversion; Computational complexity; Decoding; Embedded computing; Hidden Markov models; Humans; Lattices; Natural languages; Speech recognition;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Automatic Speech Recognition and Understanding, 2005 IEEE Workshop on
Conference_Location :
San Juan
Print_ISBN :
0-7803-9478-X
Electronic_ISBN :
0-7803-9479-8
Type :
conf
DOI :
10.1109/ASRU.2005.1566503
Filename :
1566503
Link To Document :
بازگشت