• DocumentCode
    2875469
  • Title

    Voice-to-phoneme conversion algorithms for speaker-independent voice-tag applications in embedded platforms

  • Author

    Cheng, Yan Ming ; Ma, Changxue ; Melnar, Lynette

  • Author_Institution
    Center for Human Interaction Res., Motorola Labs., Schaumburg, IL
  • fYear
    2005
  • fDate
    27-27 Nov. 2005
  • Firstpage
    403
  • Lastpage
    408
  • Abstract
    In this paper we present two voice-to-phoneme conversion algorithms that extract voice-tag abstractions for speaker-independent voice-tag applications in embedded platforms, which are very sensitive to memory and CPU consumptions. In the first approach, a voice-to-phoneme conversion in batch mode manages this task by preserving the commonality of input feature vectors of multiple voice-tag example utterances. Given multiple example utterances, a developed feature combination strategy produces an "average" utterance, which is converted to phonetic strings as a voice-tag representation via a speaker-independent phonetic decoder. In the second approach, a sequential voice-to-phoneme conversion algorithm uncovers the hierarchy of phonetic consensus embedded among multiple phonetic hypotheses generated by a speaker-independent phonetic decoder from multiple example utterances of a voice-tag. The most relevant phonetic hypotheses are then chosen to represent the voice-tag. The voice-tag representations obtained by these two voice-to-phoneme conversion algorithms are compared in speech recognition experiments to phonetic transcriptions of voice-tag reference prepared by an expert phonetician. Both algorithms either perform comparably to or significantly better than the manual transcription approach. We conclude from this that both algorithms are very effective for the targeted purposes
  • Keywords
    speaker recognition; speech processing; embedded platforms; phonetic strings; speaker-independent voice-tag applications; speech recognition experiments; voice-tag abstractions; voice-to-phoneme conversion; Computational complexity; Decoding; Embedded computing; Hidden Markov models; Humans; Lattices; Natural languages; Speech recognition;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Automatic Speech Recognition and Understanding, 2005 IEEE Workshop on
  • Conference_Location
    San Juan
  • Print_ISBN
    0-7803-9478-X
  • Electronic_ISBN
    0-7803-9479-8
  • Type

    conf

  • DOI
    10.1109/ASRU.2005.1566503
  • Filename
    1566503