DocumentCode :
1898666
Title :
Web Text Clustering Based on Concept Lattice
Author :
Shi, Yimin ; Zhang, Jun ; Zhang, Xianzhong ; Li, Yanxia
Author_Institution :
Inf. Sci. & Technol. Coll., Dalian Maritime Univ., Dalian, China
fYear :
2010
fDate :
25-26 Dec. 2010
Firstpage :
1
Lastpage :
4
Abstract :
Most web text clustering is based on the space vector text representation model. This results in a high dimension in the terms; and it leads to an increase in time complexity and a loss of text semantics due to the fact that the semantic relationship of the terms is not considered. In this paper, a new approach is taken where a concept lattice is generated with text treated as object and terms of text as attribute to construct a concept lattice. Based on this, formal concepts in the concept lattice are extracted to represent the texts. In addition, similarity function between concepts is defined. To address the drawbacks of the existing K-Means algorithm, such as random selection of initial center, a method is proposed which takes into account the density and distance factors comprehensively. This new algorithm has been applied to the clustering module of our existing maritime vertical searching engine "Haisou". The results demonstrate improved clustering efficiency and accuracy.
Keywords :
Internet; computational complexity; pattern clustering; search engines; text analysis; Haisou; Web text clustering; concept lattice; k-means algorithm; maritime vertical searching engine; similarity function; space vector text representation model; time complexity; Accidents; Clustering algorithms; Complexity theory; Context; Feature extraction; Lattices; Semantics;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Information Engineering and Computer Science (ICIECS), 2010 2nd International Conference on
Conference_Location :
Wuhan
ISSN :
2156-7379
Print_ISBN :
978-1-4244-7939-9
Electronic_ISBN :
2156-7379
Type :
conf
DOI :
10.1109/ICIECS.2010.5678243
Filename :
5678243
Link To Document :
بازگشت