Title :
Evaluation of clustering and summarizing in distributed latent semantic indexing
Author :
Behshameh, Mehdi ; Bashiri, Hassan ; Hooshmand, Salman
Author_Institution :
Dept. of Comput. Eng., Islamic Azad Univ. - Toyserkan Branch, Toyserkan, Iran
Abstract :
Latent Semantic Indexing is a conceptual method in information retrieval systems. In this method, a term-document matrix is built through term weighting techniques. This matrix is mapped to a conceptual space by mathematical decomposition techniques like Singular Value Decomposition. The more documents and key terms collection are, the more element of term-document matrix is created, causes difficulty to manage. Such a huge size of matrix needs more memory space to save and more calculation to find out the solutions. With the assumption of using distribution in order to decrease the required memory space and to reduce the run-time problem, we did a research and implemented distributed LSI. To meet a better improvement, clustering is concerned for document too. In this combination, term-document matrix is recreated for each cluster and retrieval is accomplished on these set of term-document matrices. We evaluate our combinational method on Hamshahri Collection which is the largest collection in Persian language. Evaluation shows remarkable improvement in contrast with non-combinational LSI method.
Keywords :
indexing; information retrieval systems; natural language processing; pattern clustering; singular value decomposition; Hamshahri collection; Persian language; clustering evaluation; distributed LSI method; distributed latent semantic indexing summarization; information retrieval systems; mathematical decomposition techniques; noncombinational LSI method; singular value decomposition; term weighting techniques; term-document matrix; Clustering algorithms; Data mining; Distributed computing; Indexing; Information retrieval; Large scale integration; Matrix decomposition; Runtime; Singular value decomposition; Space technology; Clustering; Information retrieval; Latent semantic indexing; Precision; Recall; Summarization;
Conference_Titel :
Information Management and Engineering (ICIME), 2010 The 2nd IEEE International Conference on
Conference_Location :
Chengdu
Print_ISBN :
978-1-4244-5263-7
Electronic_ISBN :
978-1-4244-5265-1
DOI :
10.1109/ICIME.2010.5477470