مرکز منطقه ای اطلاع رساني علوم و فناوري - Bayesian Constituent Context Model for Grammar Induction

DocumentCode :

18232

Title :

Bayesian Constituent Context Model for Grammar Induction

Author :

Min Zhang ; Xiangyu Duan ; Wenliang Chen

Author_Institution :

Sch. of Comput. Sci. & Technol., Soochow Univ., Suzhou, China

Volume :

Issue :

fYear :

2014

fDate :

Feb. 2014

Firstpage :

531

Lastpage :

541

Abstract :

Constituent Context Model (CCM) is an effective generative model for grammar induction, the aim of which is to induce hierarchical syntactic structure from natural text. The CCM simply defines the Multinomial distribution over constituents, which leads to a severe data sparse problem because long constituents are unlikely to appear in unseen data sets. This paper proposes a Bayesian method for constituent smoothing by defining two kinds of prior distributions over constituents: the Dirichlet prior and the Pitman-Yor Process prior. The Dirichlet prior functions as an additive smoothing method, and the PYP prior functions as a back-off smoothing method. Furthermore, a modified CCM is proposed to differentiate left constituents and right constituents in binary branching trees. Experiments show that both the proposed Bayesian smoothing method and the modified CCM are effective, and combining them attains or significantly improves the state-of-the-art performance of grammar induction evaluated on standard treebanks of various languages.

Keywords :

belief networks; statistical distributions; unsupervised learning; Bayesian constituent context model; CCM; Dirichlet prior; Pitman-Yor Process prior; additive smoothing method; back-off smoothing method; binary branching trees; constituent smoothing; data sparse problem; grammar induction; hierarchical syntactic structure; left constituents; multinomial distribution; prior distributions; right constituents; Additives; Bayes methods; Computational modeling; Context; Context modeling; Grammar; Smoothing methods; Bayesian; constituent context model; grammar induction; smoothing;

fLanguage :

English

Journal_Title :

Audio, Speech, and Language Processing, IEEE/ACM Transactions on

Publisher :

ieee

ISSN :

2329-9290

Type :

jour

DOI :

10.1109/TASLP.2013.2294584

Filename :

6680611

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=18232