DocumentCode :
694712
Title :
A Three-Stage Clustering Framework Based on Multiple Feature Combination for Chinese Person Name Disambiguation
Author :
Fei Wang ; Yi Yang ; Zhaocai Ma ; Lian Li
Author_Institution :
Sch. of Inf. Sci. & Eng., Lanzhou Univ., Lanzhou, China
fYear :
2013
fDate :
7-8 Dec. 2013
Firstpage :
103
Lastpage :
109
Abstract :
To solve name ambiguity problems and improve the performance of person name disambiguation, we propose a three-stage clustering algorithm in the paper. In the first stage, organizations and locations (OLs) are used to cluster documents about the same person, therefore some texts with more resemblance will be assigned to one category. This stage is simply document clustering based on the similarity of OLs. In the second stage, the clustered documents are used as a new data source from which some novel features (like co-author names) are extracted. We used these new extracted features to make additional clustering between documents. Meanwhile, a method was proposed to solve name ambiguity problems by using social networks construction based on the relationships among co-authors. In the third stage, Web pages are further clustered using content-based hierarchical agglomerative clustering (HAC) algorithm, then analyzing the useful content including title and abstract and keywords (TAKs) to disambiguate the ambiguous names. Experimental results show that our three-stage clustering algorithm can availably enhance the performance of person name disambiguation.
Keywords :
natural language processing; pattern clustering; social networking (online); text analysis; Chinese person name disambiguation performance enhancement; OL similarity; TAK; Web page clustering; ambiguous name disambiguation; co-author names; co-author relationships; content-based HAC algorithm; content-based hierarchical agglomerative clustering algorithm; data source; document clustering; feature extraction; multiple feature combination; name ambiguity problems; organization-and-location; social network construction; three-stage clustering framework; title-and-abstract-and-keywords; useful content analyzing; Abstracts; Clustering algorithms; Educational institutions; Feature extraction; Organizations; Social network services; Vectors; hierarchical agglomerative clustering; person name disambiguation; social networks;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Information Science and Cloud Computing Companion (ISCC-C), 2013 International Conference on
Conference_Location :
Guangzhou
Type :
conf
DOI :
10.1109/ISCC-C.2013.33
Filename :
6973577
Link To Document :
بازگشت