Title :
Audio-Visual Automatic Speech Recognition for Connected Digits
Author :
Xiaoping Wang ; Hao, Yufeng ; Fu, Degang ; Yuan, Chunwei
Author_Institution :
State Key Lab. of Bioelectronics, Southeast Univ., Nanjing
Abstract :
Audio-visual automatic speech recognition (ASR) is a hotspot in field of human-computer interaction (HCI). This paper implemented an audio-visual ASR for Chinese connected digits and addressed on the method of speech segmentation. A novel speech segmentation approach combining Otsupsilas method with traditional short-time energy and zero-crossing rate (ZCR) based method was proposed. The experimental results showed its efficiency compared with traditional method. Discrete cosine transform (DCT) coefficients and Mel frequency cepstral coefficients (MFCC) were then used as the visual/audio features respectively. After the recognition tasks for speaker-independent ASR were carried out, performances of audio-visual ASR and audio-only ASR under different noisy conditions were compared.
Keywords :
discrete cosine transforms; human computer interaction; speech recognition; Chinese connected digits; Mel frequency cepstral coefficients; audio-only ASR; audio-visual automatic speech recognition; discrete cosine transform coefficients; human-computer interaction; short-time energy-based method; speech segmentation; zero-crossing rate-based method; Automatic speech recognition; Discrete cosine transforms; Feature extraction; Flowcharts; Hidden Markov models; Human computer interaction; Image segmentation; Mel frequency cepstral coefficient; Neural networks; Skin; Otsu´s method; audio-visual automatic speech recognition; endpoint detection; speech segmentation;
Conference_Titel :
Intelligent Information Technology Application, 2008. IITA '08. Second International Symposium on
Conference_Location :
Shanghai
Print_ISBN :
978-0-7695-3497-8
DOI :
10.1109/IITA.2008.82