Title :
Integration of phonetic and prosodic information for robust utterance verification
Author :
Wu, C.-H. ; Chen, Y.-J. ; Yan, G.-L.
Author_Institution :
Dept. of Comput. Sci. & Inf. Eng., Nat. Cheng Kung Univ., Tainan, Taiwan
fDate :
2/1/2000 12:00:00 AM
Abstract :
Mandarin speech is known for its tonal characteristic, and prosodic information plays an important role in Mandarin speech recognition. Driven by this property, phonetic and prosodic information are integrated and used for Mandarin telephone speech keyword spotting. A two-stage strategy, with recognition followed by verification, is adopted. For keyword recognition, 132 subsyllable models, two general acoustic filler models and one background/silence model are separately trained and used as the basic recognition units. For utterance verification, 12 anti-subsyllable models, 175 context-dependent prosodic models and five anti-prosodic models are constructed. A keyword verification function combining phonetic-phase and prosodic-phase verification is investigated. Using a test set of 3088 conversational speech utterances from 33 speakers (20 males and 13 females) and a vocabulary of 2583 faculty names, at 8.5% false rejection, the proposed verification method results in an 18.3% false alarm rate. Furthermore, this method is able correctly to reject 90.9% of non-keywords. Comparison with a baseline system without prosodic-phase verification shows that prosodic information can benefit the verification performance
Keywords :
feature extraction; natural languages; speech processing; speech recognition; Mandarin speech; Mandarin speech recognition; Mandarin telephone speech keyword spotting; acoustic filler models; anti-prosodic models; anti-subsyllable models; background/silence model; baseline system; context-dependent prosodic models; keyword recognition; phonetic information; prosodic information; prosodic-phase verification; robust utterance verification; tonal characteristic; two-stage strategy; verification performance;
Journal_Title :
Vision, Image and Signal Processing, IEE Proceedings -
DOI :
10.1049/ip-vis:20000099