DocumentCode :
178471
Title :
On Validation of Clustering Techniques for Bibliographic Databases
Author :
Mishra, S. ; Saha, S. ; Mondal, S.
Author_Institution :
Dept. of Comput. Sci. & Eng., Indian Inst. of Technol. Patna, Patna, India
fYear :
2014
fDate :
24-28 Aug. 2014
Firstpage :
3150
Lastpage :
3155
Abstract :
In entity name disambiguation, performance evaluation of any approach is difficult. This is due to the fact that correct or actual results are often not known. Generally for evaluation purpose, three measures namely precision, recall and f-measure are used. They all are external validity indices because they need golden standard data. But in Bibliographic databases like DBLP, Arnetminer, Scopus, Web of Science, Google Scholar, etc., gold standard data is not easily available and it is very difficult to obtain this due to the overlapping nature of data. So, there is a need to use some other matrices for evaluation purpose. In this paper, some internal cluster validity index based schemes are proposed for evaluating entity name disambiguation algorithms when applied on bibliographic data without using any gold standard datasets. Two new internal validity indices are also proposed in the current paper for this purpose. Experimental results shown on seven bibliographic datasets reveal that proposed internal cluster validity indices are able to compare the results obtained by different methods without prior/gold standard. Thus the present paper demonstrates a novel way of evaluating any entity matching algorithm for bibliographic datasets without using any prior/gold standard information.
Keywords :
bibliographic systems; database management systems; pattern clustering; DBLP; Scopus; Web-of-science; arnetminer; bibliographic databases; clustering technique validation; disambiguation algorithms; entity matching algorithm; external validity indices; f-measure; google scholar; performance evaluation; Clustering algorithms; Equations; Gold; Indexes; Information services; Mathematical model; Standards;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Pattern Recognition (ICPR), 2014 22nd International Conference on
Conference_Location :
Stockholm
ISSN :
1051-4651
Type :
conf
DOI :
10.1109/ICPR.2014.543
Filename :
6977255
Link To Document :
بازگشت