Title :
Experiments on Cross-Language Attribute Detection and Phone Recognition With Minimal Target-Specific Training Data
Author :
Siniscalchi, Sabato Marco ; Lyu, Dau-Cheng ; Svendsen, Torbjørn ; Lee, Chin-Hui
Author_Institution :
Fac. of Archit. & Eng., Univ. of Enna Kore, Enna, Italy
fDate :
3/1/2012 12:00:00 AM
Abstract :
A state-of-the-art automatic speech recognition (ASR) system can often achieve high accuracy for most spoken languages of interest if a large amount of speech material can be collected and used to train a set of language-specific acoustic phone models. However, designing good ASR systems with little or no language-specific speech data for resource-limited languages is still a challenging research topic. As a consequence, there has been an increasing interest in exploring knowledge sharing among a large number of languages so that a universal set of acoustic phone units can be defined to work for multiple or even for all languages. This work aims at demonstrating that a recently proposed automatic speech attribute transcription framework can play a key role in designing language-universal acoustic models by sharing speech units among all target languages at the acoustic phonetic attribute level. The language-universal acoustic models are evaluated through phone recognition. It will be shown that good cross-language attribute detection and continuous phone recognition performance can be accomplished for “unseen” languages using minimal training data from the target languages to be recognized. Furthermore, a phone-based background model (PBM) approach will be presented to improve attribute detection accuracies.
Keywords :
speech recognition; ASR system; PBM approach; acoustic phone; acoustic phonetic attribute level; automatic speech attribute transcription framework; automatic speech recognition system; cross-language attribute detection; language-specific acoustic phone models; language-specific speech data; language-universal acoustic models; minimal target-specific training data; minimal training data; phone recognition; phone-based background model approach; resource-limited languages; speech material; speech units; Acoustics; Materials; Speech; Speech processing; Speech recognition; Target recognition; Training data; Knowledge-based system; phonetic features; universal acoustic modeling;
Journal_Title :
Audio, Speech, and Language Processing, IEEE Transactions on
DOI :
10.1109/TASL.2011.2167610