• DocumentCode
    1350280
  • Title

    Speech visualization by integrating features for the hearing impaired

  • Author

    Watanabe, Akira ; Tomishige, Shingo ; Nakatake, Masahiro

  • Author_Institution
    Dept. of Comput. Sci., Kumamoto Univ., Japan
  • Volume
    8
  • Issue
    4
  • fYear
    2000
  • fDate
    7/1/2000 12:00:00 AM
  • Firstpage
    454
  • Lastpage
    466
  • Abstract
    Describes development of a new speech visualization system that creates readable patterns by integrating different speech features into a single picture. The system extracts the phonemic and prosodic features from speech signals and converts them into a visual image using neither speech segmentation nor speech recognition. We used four time-delay neural networks (TDNNs) to generate phonemic features in the new system. Training of the TDNNs using three selected frames of eight kinds of acoustic parameters showed significant improvement in the performance. The TDNN outputs control the brightness of patterns used for consonants, that is, each of the consonant-patterns is represented by a different white texture whose brightness is weighted by the output of a corresponding TDNN. All the weighted consonant-patterns are simply added and then overlaid synchronously on colors due to the formant frequencies. When this is done, phonemic sequences and boundaries manifest themselves in the resulting visual patterns. In addition, the color of a single vowel sandwiched between consonants looks uniform. These visual phenomena are very useful for decoding the complex speech code, which is generated by the continuous movements of speech organs. We evaluated the visualized speech in a preliminary test. When three students read the patterns of 75 words uttered by four males (300 items), the learning curves showed a steep rise and the correct answer rate reached 96-99%. The learning effect was durable: after five months of absence from the system, a subject read 96.3% of the 300 tokens in a response time which averaged only 1.3 s/word
  • Keywords
    data visualisation; handicapped aids; hearing aids; image representation; neural nets; speech processing; acoustic parameters; boundaries; color; consonants; formant frequencies; hearing impaired; learning curves; phonemic features; phonemic sequences; prosodic features; readable patterns; speech features; speech signals; speech visualization; time-delay neural networks; visual image; visual pattern; vowel; white texture; Auditory system; Brightness; Decoding; Frequency; Image converters; Image segmentation; Neural networks; Speech coding; Speech recognition; Visualization;
  • fLanguage
    English
  • Journal_Title
    Speech and Audio Processing, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1063-6676
  • Type

    jour

  • DOI
    10.1109/89.848226
  • Filename
    848226