An Approach to Low Footprint Pronunciation Models for Embedded Speaker Independent Name Recognition

Author

Kaisheng Yao ; Netsch, Lorin

Author_Institution

Lab. of Speech Technol., Texas Instrum. Inc., Dallas, TX, USA

Volume

4

fYear

2007

fDate

15-20 April 2007

Abstract

Pronunciation modeling is an important component of speaker independent name recognition on embedded devices. Decision trees have been widely used to generate pronunciations of names due to improved accuracy. However, pronunciation modeling using decision trees may suffer from two main draw backs. The first is large memory footprint. The second is that decision trees usually generate a single pronunciation which does not reflect the real-world multiple pronunciations of a name. We present an approach to address these draw backs. The approach consists of a letter-to-phoneme mapping method that prunes many irregular pronunciations in order to train compact decision trees, and a multi-stage pronunciation transformation method that generates multiple pronunciations from the output of the trained decision trees. The approach effectively reduces footprint by more than 58% and achieves more than 23% of word error rate reduction, compared to a baseline.

Keywords

decision trees; speaker recognition; decision trees; embedded devices; embedded speaker independent name recognition; letter-to-phoneme mapping method; low footprint pronunciation models; multistage pronunciation transformation method; Automatic speech recognition; Decision trees; Engines; Error analysis; Instruments; Laboratories; Natural languages; Speech recognition; Vegetation mapping; Vocabulary; Speech recognition; decision tree; probabilistic re-write rule; pronunciation model;

fLanguage

English

Publisher

ieee

Conference_Titel

Acoustics, Speech and Signal Processing, 2007. ICASSP 2007. IEEE International Conference on

Conference_Location

Honolulu, HI

ISSN

1520-6149

Print_ISBN

1-4244-0727-3

Type

conf

DOI

10.1109/ICASSP.2007.367232

Filename

4218263