• DocumentCode
    2269023
  • Title

    Audio-Visual Automatic Speech Recognition for Connected Digits

  • Author

    Xiaoping Wang ; Hao, Yufeng ; Fu, Degang ; Yuan, Chunwei

  • Author_Institution
    State Key Lab. of Bioelectronics, Southeast Univ., Nanjing
  • Volume
    3
  • fYear
    2008
  • fDate
    20-22 Dec. 2008
  • Firstpage
    328
  • Lastpage
    332
  • Abstract
    Audio-visual automatic speech recognition (ASR) is a hotspot in field of human-computer interaction (HCI). This paper implemented an audio-visual ASR for Chinese connected digits and addressed on the method of speech segmentation. A novel speech segmentation approach combining Otsupsilas method with traditional short-time energy and zero-crossing rate (ZCR) based method was proposed. The experimental results showed its efficiency compared with traditional method. Discrete cosine transform (DCT) coefficients and Mel frequency cepstral coefficients (MFCC) were then used as the visual/audio features respectively. After the recognition tasks for speaker-independent ASR were carried out, performances of audio-visual ASR and audio-only ASR under different noisy conditions were compared.
  • Keywords
    discrete cosine transforms; human computer interaction; speech recognition; Chinese connected digits; Mel frequency cepstral coefficients; audio-only ASR; audio-visual automatic speech recognition; discrete cosine transform coefficients; human-computer interaction; short-time energy-based method; speech segmentation; zero-crossing rate-based method; Automatic speech recognition; Discrete cosine transforms; Feature extraction; Flowcharts; Hidden Markov models; Human computer interaction; Image segmentation; Mel frequency cepstral coefficient; Neural networks; Skin; Otsu´s method; audio-visual automatic speech recognition; endpoint detection; speech segmentation;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Intelligent Information Technology Application, 2008. IITA '08. Second International Symposium on
  • Conference_Location
    Shanghai
  • Print_ISBN
    978-0-7695-3497-8
  • Type

    conf

  • DOI
    10.1109/IITA.2008.82
  • Filename
    4740012