Heterogeneous lexical units for automatic speech recognition: preliminary investigations

Author

Bazzi, Issam ; Glass, James

Author_Institution

Lab. for Comput. Sci., MIT, Cambridge, MA, USA

Volume

3

fYear

2000

fDate

2000

Firstpage

1257

Abstract

This paper explores the use of the phone and syllable as primary units of representation in the first stage of a two-stage recognizer. A finite-state transducer speech recognizer is utilized to configure the recognition as a two-stage process, where either phone or syllable graphs are computed in the first stage, and passed to the second stage to determine the most likely word hypotheses. Preliminary experiments in a weather information speech understanding domain show that a syllable representation with either bigram or trigram language models provides more constraint than a phonetic representation with a higher-order n-gram language model (up to a 6-gram), and approaches the performance of a more conventional single-stage word-based configuration

Keywords

graph theory; speech recognition; automatic speech recognition; bigram language model; finite-state transducer speech recognizer; graphs; heterogeneous lexical units; performance; phone; representation; syllable; trigram language model; two-stage process; two-stage recognizer; weather information speech understanding domain; word hypotheses; Automatic speech recognition; Computer science; Glass; Information systems; Laboratories; Natural languages; Speech processing; Speech recognition; Transducers; Vocabulary;

fLanguage

English

Publisher

ieee

Conference_Titel

Acoustics, Speech, and Signal Processing, 2000. ICASSP '00. Proceedings. 2000 IEEE International Conference on

Conference_Location

Istanbul

ISSN

1520-6149

Print_ISBN

0-7803-6293-4

Type

conf

DOI

10.1109/ICASSP.2000.861804

Filename

861804