Robust understanding of spoken Chinese through character-based tagging and prior knowledge exploitation

Author

Xu, Weiqun ; Bao, Changchun ; Li, Yali ; Pan, Jielin ; Yan, Yonghong

Author_Institution

Key Lab. of Speech Acoust. & Content Understanding, Inst. of Acoust., Beijing, China

fYear

2011

fDate

11-15 Dec. 2011

Firstpage

413

Lastpage

418

Abstract

Robustness is one of the most challenging issues for spoken language understanding (SLU). In this paper we studied the semantic understanding of Chinese spoken language for a voice search dialogue system. We first simplified the problem of semantic understanding into a named entity recognition (NER) task, which was further formulated as sequential tagging. We carried out experiments to opt for character over word as the tagging unit. Then two approaches were proposed to exploit prior knowledge - in the form of a domain lexicon - into the character-based tagging framework. One enriched tagger features by incorporating more formal lexical features with a domain lexicon. The other made plain use of domain entities by simply adding them to the training data. Experiment results show that both approaches are effective. The best performance is achieved by combining the above two complimentary approaches. By exploiting prior knowledge we improved the NER performance from 75.27 to 90.24 in F₁ score on a field test set using speech recognizer output.

Keywords

natural language processing; speech processing; Chinese spoken language; character-based tagging framework; domain lexicon; formal lexical features; named entity recognition; prior knowledge exploitation; robust understanding; semantic understanding; sequential tagging; spoken Chinese; spoken language understanding; voice search dialogue system; Hidden Markov models; Robustness; Semantics; Speech; Speech recognition; Tagging; Training data;

fLanguage

English

Publisher

ieee

Conference_Titel

Automatic Speech Recognition and Understanding (ASRU), 2011 IEEE Workshop on

Conference_Location

Waikoloa, HI

Print_ISBN

978-1-4673-0365-1

Electronic_ISBN

978-1-4673-0366-8

Type

conf

DOI

10.1109/ASRU.2011.6163967

Filename

6163967