• DocumentCode
    178471
  • Title

    On Validation of Clustering Techniques for Bibliographic Databases

  • Author

    Mishra, S. ; Saha, S. ; Mondal, S.

  • Author_Institution
    Dept. of Comput. Sci. & Eng., Indian Inst. of Technol. Patna, Patna, India
  • fYear
    2014
  • fDate
    24-28 Aug. 2014
  • Firstpage
    3150
  • Lastpage
    3155
  • Abstract
    In entity name disambiguation, performance evaluation of any approach is difficult. This is due to the fact that correct or actual results are often not known. Generally for evaluation purpose, three measures namely precision, recall and f-measure are used. They all are external validity indices because they need golden standard data. But in Bibliographic databases like DBLP, Arnetminer, Scopus, Web of Science, Google Scholar, etc., gold standard data is not easily available and it is very difficult to obtain this due to the overlapping nature of data. So, there is a need to use some other matrices for evaluation purpose. In this paper, some internal cluster validity index based schemes are proposed for evaluating entity name disambiguation algorithms when applied on bibliographic data without using any gold standard datasets. Two new internal validity indices are also proposed in the current paper for this purpose. Experimental results shown on seven bibliographic datasets reveal that proposed internal cluster validity indices are able to compare the results obtained by different methods without prior/gold standard. Thus the present paper demonstrates a novel way of evaluating any entity matching algorithm for bibliographic datasets without using any prior/gold standard information.
  • Keywords
    bibliographic systems; database management systems; pattern clustering; DBLP; Scopus; Web-of-science; arnetminer; bibliographic databases; clustering technique validation; disambiguation algorithms; entity matching algorithm; external validity indices; f-measure; google scholar; performance evaluation; Clustering algorithms; Equations; Gold; Indexes; Information services; Mathematical model; Standards;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Pattern Recognition (ICPR), 2014 22nd International Conference on
  • Conference_Location
    Stockholm
  • ISSN
    1051-4651
  • Type

    conf

  • DOI
    10.1109/ICPR.2014.543
  • Filename
    6977255