مرکز منطقه ای اطلاع رساني علوم و فناوري - Improved mixed language speech recognition using asymmetric acoustic model and language model with code-switch inversion constraints

DocumentCode :

1690183

Title :

Improved mixed language speech recognition using asymmetric acoustic model and language model with code-switch inversion constraints

Author :

Ying Li ; Fung, Pascale

Author_Institution :

Dept. of Electron. & Comput. Eng., Hong Kong Univ. of Sci. & Technol., Hong Kong, China

fYear :

2013

Firstpage :

7368

Lastpage :

7372

Abstract :

We propose an integrated framework for large vocabulary continuous mixed language speech recognition that handles the accent effect in the bilingual acoustic model and the inversion constraint well known to linguists in the language model. Our asymmetric acoustic model with phone set extension improves upon previous work by striking a balance between data and phonetic knowledge. Our language model improves upon previous work by (1) using the inversion constraint to predict code switching points in the mixed language and (2) integrating a code-switch prediction model, a translation model and a reconstruction model together. This integration means that our language model avoids the pitfall of propagated error that could arise from decoupling these steps. Finally, a WFST-based decoder integrates the acoustic models, code-switch language model and a monolingual language model in the matrix language all together. Our system reduces word error rate by 1.88% on a lecture speech corpus and by 2.43% on a lunch conversation corpus, with statistical significance, over the conventional bilingual acoustic model and interpolated language model.

Keywords :

natural language processing; speech coding; speech recognition; WFST-based decoder; accent effect; asymmetric acoustic model; bilingual acoustic model; code switching points; code-switch inversion constraints; code-switch language model; code-switch prediction model; interpolated language model; large vocabulary continuous mixed language speech recognition; lecture speech corpus; lunch conversation corpus; matrix language; monolingual language model; phone set extension; phonetic knowledge; reconstruction model; translation model; word error rate; Acoustics; Adaptation models; Data models; Hidden Markov models; Predictive models; Speech; Speech recognition; mixed language; multilingual speech recognition;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on

Conference_Location :

Vancouver, BC

ISSN :

1520-6149

Type :

conf

DOI :

10.1109/ICASSP.2013.6639094

Filename :

6639094

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=1690183