مرکز منطقه ای اطلاع رساني علوم و فناوري - Minimum discrimination information-based language model adaptation using tiny domain corpora for intelligent personal assistants

DocumentCode :

1401245

Title :

Minimum discrimination information-based language model adaptation using tiny domain corpora for intelligent personal assistants

Author :

Gil-Jin Jang ; Saejoon Kim ; Ji-Hwan Kim

Author_Institution :

Sch. of Electr. & Comput. Eng., Ulsan Nat. Inst. of Sci. & Technol., Ulsan, South Korea

Volume :

Issue :

fYear :

2012

fDate :

11/1/2012 12:00:00 AM

Firstpage :

1359

Lastpage :

1365

Abstract :

This paper proposes a novel Language Model (LM) adaptation method based on Minimum Discrimination Information (MDI). In the proposed method, a background LM is viewed as a discrete distribution and an adapted LM is built to be as close as possible to the background LM, while satisfying unigram constraint. This is due to the fact that there is a limited amount of domain corpus available for the adaptation of a natural language-based intelligent personal assistant system. Two unigram constraint estimation methods are proposed: one based on word frequency in the domain corpus, and one based on word similarity estimated from WordNet. In terms of the adapted LM´s perplexity using word frequency in tiny domain corpora (ranging from 30~120 seconds in length) the relative performance improvements are measured at 13.9%~16.6%. Further relative performance improvements (1.5%~2.4%) are observed when WordNet is used to generate word similarities. These successes express an efficient ways for re-scaling and normalizing the conditional distribution, which uses an interpolation-based LM.

Keywords :

interpolation; mobile computing; natural language interfaces; notebook computers; performance evaluation; text analysis; LM adaptation method; MDI; WordNet; adapted LM perplexity; background LM; conditional distribution; discrete distribution; interpolation-based LM; language model adaptation method; minimum discrimination information; natural language-based intelligent personal assistant system; relative performance improvements; tiny domain corpora; unigram constraint estimation methods; word frequency; word similarity; Adaptation models; Estimation; Frequency domain analysis; Frequency estimation; Probability distribution; Semantics; Vocabulary; Constraint estimation; Language model adaptation; Minimum discriminationinformation; Tiny domaincorpus;

fLanguage :

English

Journal_Title :

Consumer Electronics, IEEE Transactions on

Publisher :

ieee

ISSN :

0098-3063

Type :

jour

DOI :

10.1109/TCE.2012.6415007

Filename :

6415007

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=1401245