• DocumentCode
    1273125
  • Title

    Speech activated telephony email reader (SATER) based on speaker verification and text-to-speech conversion

  • Author

    Wu, Chung-Hsien ; Chen, Jau-Hung

  • Author_Institution
    Inst. of Inf. Eng., Nat. Cheng Kung Univ., Tainan, Taiwan
  • Volume
    43
  • Issue
    3
  • fYear
    1997
  • fDate
    8/1/1997 12:00:00 AM
  • Firstpage
    707
  • Lastpage
    716
  • Abstract
    A speech activated telephony email reader (SATER) is proposed. SATER is an integrated system combining speaker verification, network, and text-to-speech conversion. A registered user can activate and listen to his own email through a wired/wireless telephone. In the speaker verification subsystem, a time-varying and speaker-dependent verification phrase is adopted. The speaker´s password is used to generate the verification phrases for that speaker. A hidden Markov Model with states of a variable number is used to model each verification phrase. In the text-to-speech (TTS) subsystem, a prosody modification approach is proposed on the basis of word units. Appropriate word prosodic patterns in a sentence are selected from a word prosody database using linguistic features. This system has been tested on 20 subjects. In the speaker verification test, at 1.5% false rejection, the verification system resulted in 0.5% false acceptance. The results for the TTS conversion system indicated that the average correct rate was 95.7% for intelligibility, and that the mean opinion score (MOS) was 3.4 for naturalness
  • Keywords
    electronic mail; hidden Markov models; speaker recognition; speech intelligibility; speech processing; speech synthesis; telephony; average correct rate; email; false acceptance; false rejection; hidden Markov Model; integrated system; linguistic features; mean opinion score; password; prosody modification; sentence; speaker dependent verification; speaker verification subsystem phrase; speaker verification test; speech activated telephony email reader; speech intelligibility; text to speech conversion; time-varying verification phrase; wired/wireless telephone; word prosodic patterns; word prosody database; word units; Application software; Computer networks; Error analysis; Hidden Markov models; Loudspeakers; Spatial databases; Speech synthesis; System testing; Telephony; Web and internet services;
  • fLanguage
    English
  • Journal_Title
    Consumer Electronics, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    0098-3063
  • Type

    jour

  • DOI
    10.1109/30.628698
  • Filename
    628698