Voice-to-phoneme conversion algorithms for speaker-independent voice-tag applications in embedded platforms

Author

Cheng, Yan Ming ; Ma, Changxue ; Melnar, Lynette

Author_Institution

Center for Human Interaction Res., Motorola Labs., Schaumburg, IL

fYear

2005

fDate

27-27 Nov. 2005

Firstpage

403

Lastpage

408

Abstract

In this paper we present two voice-to-phoneme conversion algorithms that extract voice-tag abstractions for speaker-independent voice-tag applications in embedded platforms, which are very sensitive to memory and CPU consumptions. In the first approach, a voice-to-phoneme conversion in batch mode manages this task by preserving the commonality of input feature vectors of multiple voice-tag example utterances. Given multiple example utterances, a developed feature combination strategy produces an "average" utterance, which is converted to phonetic strings as a voice-tag representation via a speaker-independent phonetic decoder. In the second approach, a sequential voice-to-phoneme conversion algorithm uncovers the hierarchy of phonetic consensus embedded among multiple phonetic hypotheses generated by a speaker-independent phonetic decoder from multiple example utterances of a voice-tag. The most relevant phonetic hypotheses are then chosen to represent the voice-tag. The voice-tag representations obtained by these two voice-to-phoneme conversion algorithms are compared in speech recognition experiments to phonetic transcriptions of voice-tag reference prepared by an expert phonetician. Both algorithms either perform comparably to or significantly better than the manual transcription approach. We conclude from this that both algorithms are very effective for the targeted purposes

Keywords

speaker recognition; speech processing; embedded platforms; phonetic strings; speaker-independent voice-tag applications; speech recognition experiments; voice-tag abstractions; voice-to-phoneme conversion; Computational complexity; Decoding; Embedded computing; Hidden Markov models; Humans; Lattices; Natural languages; Speech recognition;

fLanguage

English

Publisher

ieee

Conference_Titel

Automatic Speech Recognition and Understanding, 2005 IEEE Workshop on

Conference_Location

San Juan

Print_ISBN

0-7803-9478-X

Electronic_ISBN

0-7803-9479-8

Type

conf

DOI

10.1109/ASRU.2005.1566503

Filename

1566503