Title :
Robust phone set mapping using decision tree clustering for cross-lingual phone recognition
Author :
Sim, Khe Chai ; Li, Haizhou
Author_Institution :
Inst. for Infocomm Res., Singapore
fDate :
March 31 2008-April 4 2008
Abstract :
Recently, research related to multi-lingual and cross-lingual speech has gained increasing popularity. One of the major problems when dealing with multi-lingual speech data is the mapping of the phone sets between different languages. Phone mapping is useful for cross-lingual speech recognition, cross-lingual pronunciation modelling and mixed language speech synthesis, to name a few. In this paper, an automatic context sensitive phone set mapping method is presented to improve the mapping accuracy. A training methodology that allows the mapping to be learned automatically from parallel time-aligned phone transcriptions is also described. In particular, a decision tree clustering technique is used to tie unseen contexts for robustness. The quality of the proposed mapping method is evaluated on a cross-lingual phone recognition task where the Hungarian and Russian phone recognisers are used to recognise Czech speech and produce Czech phone sequences through phone set mapping. The mapping was trained on only a small amount of data. A consistent relative improvement of 5 - 7% is reported when contextual information is added to phone set mapping.
Keywords :
decision trees; natural language processing; speech recognition; speech synthesis; Czech phone sequences; Hungarian phone recognition; Russian phone recognition; automatic context sensitive phone set mapping method; cross-lingual phone recognition; cross-lingual pronunciation modelling; decision tree clustering; mixed language speech synthesis; multi-lingual speech; parallel time-aligned phone transcriptions; robust phone set mapping; Automatic speech recognition; Context modeling; Decision trees; Natural languages; Robust control; Robustness; Speech analysis; Speech recognition; Speech synthesis; Target recognition; context sensitive mapping; cross-lingual; decision tree clustering; phone recognition;
Conference_Titel :
Acoustics, Speech and Signal Processing, 2008. ICASSP 2008. IEEE International Conference on
Conference_Location :
Las Vegas, NV
Print_ISBN :
978-1-4244-1483-3
Electronic_ISBN :
1520-6149
DOI :
10.1109/ICASSP.2008.4518608