Title :
Comparison Probabilistic Latent Semantic Indexing Model In Chinese Information Retrieval
Author :
Fang, Xie ; Xiaoguang, Liu ; Quan, Hu
Author_Institution :
Coll. of Comput. Sci., Hubei Univ. of Technol., Wuhan, China
Abstract :
With the increasing of information on Internet, Web mining has been the focus of information retrieval. By a certain metric of similarity, Web clustering groups the similar Web documents. But the classical algorithms of clustering are aimless in searching the solution space and absent of semantic characters. In this paper, the probabilistic latent semantic indexing (PLSI) models which using word segmentation, two-grams and key words extraction separately are compared. As comparison, vector models using different Chinese information retrieval technologies are also tested in the same time. The experimental results show that the correct word segmentation can improve precision of information retrieval obviously to PLSI model. But it isn´t effective to vector space model. And index based on key words extraction obtains highest accuracy rate to PLSI model.
Keywords :
Internet; data mining; indexing; information retrieval; Chinese information retrieval; Internet; PLSI model; Web clustering; Web documents; Web mining; key words extraction; probabilistic latent semantic indexing model; word segmentation; Application software; Clustering algorithms; Data mining; Educational institutions; Information analysis; Information retrieval; Information technology; Internet; Machine assisted indexing; Space technology; Chinese information retrieval; N-Grams retrieval; probabilistic latent semantic indexing; word segmentation;
Conference_Titel :
Information Technology and Applications, 2009. IFITA '09. International Forum on
Conference_Location :
Chengdu
Print_ISBN :
978-0-7695-3600-2
DOI :
10.1109/IFITA.2009.532