DocumentCode
2704387
Title
Probabilistic Document Correlation Model
Author
Jia, Xiping ; Peng, Hong
Author_Institution
South China Univ. of Technol., Guangzhou
fYear
2007
fDate
15-19 Dec. 2007
Firstpage
433
Lastpage
436
Abstract
Vector space model (VSM) and related models are popular in document relationship analysis in text mining recently. However, they are failed to discover the document correlation from topic level. This paper proposes a probabilistic document correlation model (PDC) to capture the document correlation based on topics. The PDC model defines the document correlation by the posterior probability of documents. And the posterior probability of each document is resolved through introducing the posterior probability of topics and topic similarity. Latent Dirichlet allocation (LDA), a generative topic model, is used for topic retrieval in this paper. Experiments on correlated document search show that the PDC model outperforms the VSM in average retrieval precision and document compressing.
Keywords
information retrieval; probability; text analysis; vectors; latent Dirichlet allocation; posterior probability of documents; probabilistic document correlation model; text mining; topic retrieval; topic similarity; vector space model; Bipartite graph; Computational intelligence; Computer science; Computer security; Functional analysis; Optimal matching; Space technology; Text analysis; Text mining; Vocabulary;
fLanguage
English
Publisher
ieee
Conference_Titel
Computational Intelligence and Security Workshops, 2007. CISW 2007. International Conference on
Conference_Location
Heilongjiang
Print_ISBN
978-0-7695-3073-4
Type
conf
DOI
10.1109/CISW.2007.4425527
Filename
4425527
Link To Document