Title :
Community structure of the Chinese document network based on content similarity
Author :
Pan, Xin ; Liu, Jian-Guo ; Deng, Guishi
Author_Institution :
Inst. of Syst. Eng., Dalian Univ. of Technol., Dalian, China
Abstract :
Based on the complex network theory, we proposed a clustering algorithm based on content similarity. Firstly, the Chinese documents are represented by the vector-space model, and the content similarity between any two documents is computed by the cosine similarity. Consequently, the network node is defined as a document, and the edge weight is defined as the similarity obtained by the cosine similarity definition. The document connectivity network can be constructed based on the document-to-document similarity graph. If the edge weight between any two nodes is smaller than a constant value, then it´s set as zero. Using the edge betweenness of the network, we reconstructed the hierarchical structure of the funding proposal network. Computing the edge betweenness, and remove the edge with largest betweenness; Repeat the above process until all edges are removed. Using an open dataset proposed by Fudan University, we experimentally compared the performance of the partition clustering algorithm and other algorithms, such as K-means and Bisecting K-means. The numerical results indicate that our algorithm is more efficient than K-means and Bisecting K-means algorithms. In addition, the numerical results are robustness to different constant. Finally, the algorithm is implemented on the proposal network, the community structure based on the content similarity is detected.
Keywords :
document handling; pattern clustering; text analysis; Chinese document network; K-means; clustering algorithm; community structure; complex network theory; content similarity; vector-space model; Algorithm design and analysis; Clustering algorithms; Communities; Complex networks; Data mining; Partitioning algorithms; Proposals; community structure; complex network; component; content similarity;
Conference_Titel :
Fuzzy Systems and Knowledge Discovery (FSKD), 2010 Seventh International Conference on
Conference_Location :
Yantai, Shandong
Print_ISBN :
978-1-4244-5931-5
DOI :
10.1109/FSKD.2010.5569332