• DocumentCode
    2540891
  • Title

    Evaluation of clustering and summarizing in distributed latent semantic indexing

  • Author

    Behshameh, Mehdi ; Bashiri, Hassan ; Hooshmand, Salman

  • Author_Institution
    Dept. of Comput. Eng., Islamic Azad Univ. - Toyserkan Branch, Toyserkan, Iran
  • fYear
    2010
  • fDate
    16-18 April 2010
  • Firstpage
    49
  • Lastpage
    53
  • Abstract
    Latent Semantic Indexing is a conceptual method in information retrieval systems. In this method, a term-document matrix is built through term weighting techniques. This matrix is mapped to a conceptual space by mathematical decomposition techniques like Singular Value Decomposition. The more documents and key terms collection are, the more element of term-document matrix is created, causes difficulty to manage. Such a huge size of matrix needs more memory space to save and more calculation to find out the solutions. With the assumption of using distribution in order to decrease the required memory space and to reduce the run-time problem, we did a research and implemented distributed LSI. To meet a better improvement, clustering is concerned for document too. In this combination, term-document matrix is recreated for each cluster and retrieval is accomplished on these set of term-document matrices. We evaluate our combinational method on Hamshahri Collection which is the largest collection in Persian language. Evaluation shows remarkable improvement in contrast with non-combinational LSI method.
  • Keywords
    indexing; information retrieval systems; natural language processing; pattern clustering; singular value decomposition; Hamshahri collection; Persian language; clustering evaluation; distributed LSI method; distributed latent semantic indexing summarization; information retrieval systems; mathematical decomposition techniques; noncombinational LSI method; singular value decomposition; term weighting techniques; term-document matrix; Clustering algorithms; Data mining; Distributed computing; Indexing; Information retrieval; Large scale integration; Matrix decomposition; Runtime; Singular value decomposition; Space technology; Clustering; Information retrieval; Latent semantic indexing; Precision; Recall; Summarization;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Information Management and Engineering (ICIME), 2010 The 2nd IEEE International Conference on
  • Conference_Location
    Chengdu
  • Print_ISBN
    978-1-4244-5263-7
  • Electronic_ISBN
    978-1-4244-5265-1
  • Type

    conf

  • DOI
    10.1109/ICIME.2010.5477470
  • Filename
    5477470