Directory name retrieval over the telephone in the Picasso project

Author

Neubert, F. ; Gravier, Guillaume ; Yvon, F. ; Chollet, G.

Author_Institution

Ecole Nat. Superieure des Telecommun., Paris, France

fYear

1998

fDate

29-30 Sep 1998

Firstpage

31

Lastpage

36

Abstract

The European project Picasso intends to develop and test several telematics transaction services that will be accessible via the worldwide telephone network. In this framework, ENST works on developing an automated speech recognition system of pronounced and spelled names, for telephone quality speech in French. The recognizer is based on Hidden Markov modeling of speech units using word models for spelled letters and phone models for name pronunciation. Bigram probabilities are introduced at this stage for phonemes and letters, in order to improve the quality of decoding. The directory was built automatically from the list of the names contained in the database, using a grapheme to phoneme converter for the names and rules for spellings, each entry in the directory consisting of several pronunciations and spelling variants. After the acoustic recognition phase, the corresponding entry in the directory is then found using dynamic alignment of symbol sequences, with insertion, deletion and substitution costs determined from the training data to take into account acoustic confusability. As this lexical search is very time consuming for large directories, we present a faster method using pre-selection in a tree-based representation of the lexicon. A rescoring strategy on the 10 best outputs is also evaluated

Keywords

acoustic signal processing; automatic telephone systems; decoding; grammars; hidden Markov models; probability; speech intelligibility; speech recognition; telephone networks; ENST; European project; French; HMM; Hidden Markov modeling; Picasso project; acoustic confusability; acoustic recognition phase; automated speech recognition system; bigram probabilities; database; decoding quality; deletion; directory name retrieval; dynamic alignment; grapheme to phoneme converter; insertion; letters; lexical search; name pronunciation; phone models; phonemes; pre-selection; rescoring strategy; speech units; spelled letters; spelled names; spelling variants; substitution costs; symbol sequences; telematics transaction services; telephone quality speech; training data; tree-based representation; word models; worldwide telephone network; Automatic speech recognition; Costs; Databases; Decoding; Hidden Markov models; Speech recognition; Telematics; Telephony; Testing; Training data;

fLanguage

English

Publisher

ieee

Conference_Titel

Interactive Voice Technology for Telecommunications Applications, 1998. IVTTA '98. Proceedings. 1998 IEEE 4th Workshop

Conference_Location

Torino

Print_ISBN

0-7803-5028-6

Type

conf

DOI

10.1109/IVTTA.1998.727689

Filename

727689