• DocumentCode
    2704387
  • Title

    Probabilistic Document Correlation Model

  • Author

    Jia, Xiping ; Peng, Hong

  • Author_Institution
    South China Univ. of Technol., Guangzhou
  • fYear
    2007
  • fDate
    15-19 Dec. 2007
  • Firstpage
    433
  • Lastpage
    436
  • Abstract
    Vector space model (VSM) and related models are popular in document relationship analysis in text mining recently. However, they are failed to discover the document correlation from topic level. This paper proposes a probabilistic document correlation model (PDC) to capture the document correlation based on topics. The PDC model defines the document correlation by the posterior probability of documents. And the posterior probability of each document is resolved through introducing the posterior probability of topics and topic similarity. Latent Dirichlet allocation (LDA), a generative topic model, is used for topic retrieval in this paper. Experiments on correlated document search show that the PDC model outperforms the VSM in average retrieval precision and document compressing.
  • Keywords
    information retrieval; probability; text analysis; vectors; latent Dirichlet allocation; posterior probability of documents; probabilistic document correlation model; text mining; topic retrieval; topic similarity; vector space model; Bipartite graph; Computational intelligence; Computer science; Computer security; Functional analysis; Optimal matching; Space technology; Text analysis; Text mining; Vocabulary;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computational Intelligence and Security Workshops, 2007. CISW 2007. International Conference on
  • Conference_Location
    Heilongjiang
  • Print_ISBN
    978-0-7695-3073-4
  • Type

    conf

  • DOI
    10.1109/CISW.2007.4425527
  • Filename
    4425527