Research on Text Clustering Algorithms

Author

Li Qun ; Huang Xinyuan

Author_Institution

Sch. of Inf. Sci. &Technol., Beijing Forestry Univ., Beijing, China

fYear

2010

fDate

27-28 Nov. 2010

Firstpage

Lastpage

Abstract

Web documents are enormous. Text clustering is to place the documents with the most words in common into the same cluster. Thus the web search engine can structure the large result set for a certain quest. In this article, we study three kinds of clustering algorithms, prototype based, density based and hierarchical clustering algorithms. We compare two typical algorithms, K-medoids and DBSCAN. The results show that the K-medoids is sensitive to the initial center point and the DBSCAN has a better performance.

Keywords

pattern clustering; query processing; search engines; text analysis; DBSCAN; K-medoids; Web document; Web search engine; density based clustering; hierarchical clustering algorithms; prototype based clustering; text clustering algorithm; Algorithm design and analysis; Classification algorithms; Clustering algorithms; Films; Forestry; Noise; Partitioning algorithms;

fLanguage

English

Publisher

ieee

Conference_Titel

Database Technology and Applications (DBTA), 2010 2nd International Workshop on

Conference_Location

Wuhan

Print_ISBN

978-1-4244-6975-8

Electronic_ISBN

978-1-4244-6977-2

Type

conf

DOI

10.1109/DBTA.2010.5659055

Filename

5659055

Link To Document

https://search.isc.ac/dl/search/defaultta.aspx?DTC=49&DC=3453854