Title :
Estimation of language models for new spoken language applications
Author_Institution :
Sch. of Comput. Sci., Carnegie Mellon Univ., Pittsburgh, PA, USA
Abstract :
Spoken language interfaces can provide natural communication for many database retrieval tasks. The CMU ATIS system provides an example of accessing airline information using spoken natural language queries. However, a lot of training data is needed to develop a spoken language application. For example, one needs training data to generate a language model that can be used by the recognizer to reduce the search space. The author addresses some issues arising from small amount of training data available for a new spoken language application. The author is working on a spoken language interface to access information from a library catalogue. The catalogue contains around 13,000 titles, 6000 authors and 19000 subjects. There an more than 20,000 words in the dictionary. The user can seek information about books, authors, subjects, publishers, etc. For example, “I´d like to see books dealing with Science fiction by Clarke.” The author describes some language modelling experiments for this task. The author briefly describes a speech interface for a library catalogue. The author also reviews class-based language models and describes their limitations. Finally, the author presents the approach to building statistical language models for new spoken language applications. This is important because a lot of training data is normally needed to generate a language model. However, it is not practical to have or collect a large corpus of data for each new spoken language application
Keywords :
library automation; natural language interfaces; query processing; speech recognition; CMU ATIS system; airline information access; class-based language models; database retrieval tasks; dictionary; language model estimation; library catalogue; natural communication; search space; speech recognizer; spoken language applications; spoken language interfaces; spoken natural language queries; statistical language models; training data; Application software; Books; Databases; Dictionaries; Libraries; Mutual funds; Natural languages; Speech; Telephony; Training data;
Conference_Titel :
Spoken Language, 1996. ICSLP 96. Proceedings., Fourth International Conference on
Conference_Location :
Philadelphia, PA
Print_ISBN :
0-7803-3555-4
DOI :
10.1109/ICSLP.1996.607739