• DocumentCode
    694736
  • Title

    New Features Acquisition of Text with Cloud-LDA Model

  • Author

    Maoyuan Zhang ; Fanli He ; Shuiyin Chen

  • Author_Institution
    Dept. of Comput. Sci. & Technol., Central China Normal Univ., Wuhan, China
  • fYear
    2013
  • fDate
    7-8 Dec. 2013
  • Firstpage
    267
  • Lastpage
    272
  • Abstract
    This paper probes into how to improve Information Retrieval by changing the feature distribution of the text. It introduces Cloud Model theory into Latent Dirichlet Allocation(LDA) Model and build a new feature selection system. LDA Model is used to mine the underlying topical structure. Each topic is associated with a multinomial distribution over words which are semantic related. But there is doubt that themes are relevant with each other in the light of semantics. Based on LDA model presented probability distribution of vocabulary in text, the new system with Cloud Model theory can automatically simulate feature set whose contribution degree is high in the text. Results show this feature set has less features but higher classification accuracy, thus obviously better than currently popular feature selection methods. If the query is matched to words with high contribution degree, the more these words are, the more relevant the article searched out is with the query. NTCIR-5 (the 5th NII Test Collection for IR Systems) collections of Experiment on SLIR (Single Language IR) show that this method achieves an obvious improvement compared with some other methods in IR.
  • Keywords
    cloud computing; feature extraction; information retrieval; statistical distributions; text analysis; Information Retrieval; LDA model; NTCIR-5; SLIR; cloud model theory; cloud-LDA model; feature selection system; latent Dirichlet allocation; probability distribution; single language IR; text features acquisition; Computational modeling; Data models; Feature extraction; Indexes; Information retrieval; Semantics; Uncertainty; Cloud Model; Information Retrieval; LDA model; feature;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Information Science and Cloud Computing Companion (ISCC-C), 2013 International Conference on
  • Conference_Location
    Guangzhou
  • Type

    conf

  • DOI
    10.1109/ISCC-C.2013.94
  • Filename
    6973603