مرکز منطقه ای اطلاع رساني علوم و فناوري - Latent Dirichlet learning for hierarchical segmentation

DocumentCode :

2172987

Title :

Latent Dirichlet learning for hierarchical segmentation

Author :

Chien, Jen-Tzung ; Chueh, Chuang-Hua

Author_Institution :

Dept. of Electr. & Comput. Eng., Nat. Chiao Tung Univ., Hsinchu, Taiwan

fYear :

2012

fDate :

23-26 Sept. 2012

Firstpage :

Lastpage :

Abstract :

Topic model can be established by using Dirichlet distributions as the prior model to characterize latent topics in natural language. However, topics in real-world stream data are non-stationary. Training a reliable topic model is a challenging study. Further, the usage of words in different paragraphs within a document is varied due to different composition styles. This study presents a hierarchical segmentation model by compensating the heterogeneous topics in stream level and the heterogeneous words in document level. The topic similarity between sentences is calculated to form a beta prior for stream-level segmentation. This segmentation prior is adopted to group topic-coherent sentences into a document. For each pseudo-document, we incorporate a Markov chain to detect stylistic segments within a document. The words in a segment are generated by identical composition style. This new model is inferred by a variational Bayesian EM procedure. Experimental results show benefits by using the proposed model in terms of perplexity and F measure.

Keywords :

Markov processes; belief networks; inference mechanisms; learning (artificial intelligence); natural language processing; variational techniques; word processing; Dirichlet distributions; F measure; Markov chain; beta prior; composition styles; document paragraphs; heterogeneous topic compensation; heterogeneous word usage; hierarchical segmentation model; latent Dirichlet learning; latent topic characterization; natural language; nonstationary real-world stream data; perplexity; pseudodocument level; sentence topic similarity; stream-level segmentation; stylistic segment detection; topic model training; topic-coherent sentence grouping; variational Bayesian EM procedure; variational inference procedure; word generation; Computational modeling; Hidden Markov models; Machine learning; Markov processes; Reliability; Training; Vectors; Graphical Model; Hierarchical Segmentation; Machine Learning; Topic Model;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Machine Learning for Signal Processing (MLSP), 2012 IEEE International Workshop on

Conference_Location :

Santander

ISSN :

1551-2541

Print_ISBN :

978-1-4673-1024-6

Electronic_ISBN :

1551-2541

Type :

conf

DOI :

10.1109/MLSP.2012.6349772

Filename :

6349772

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2172987