• DocumentCode
    168294
  • Title

    Combining domain-specific heuristics for author name disambiguation

  • Author

    Santana, Alan Filipe ; Goncalves, Marcos Andre ; Laender, Alberto H. F. ; Ferreira, Andre

  • Author_Institution
    Dept. de Cienc. da Comput., Univ. Fed. de Minas Gerais, Belo Horizonte, Brazil
  • fYear
    2014
  • fDate
    8-12 Sept. 2014
  • Firstpage
    173
  • Lastpage
    182
  • Abstract
    Author name disambiguation has been one of the hardest problems faced by digital libraries since their early days. Historically, supervised solutions have empirically outperformed those based on heuristics, but with the burden of having to rely on manually labelled training sets for the learning process. Moreover, most supervised solutions just apply some type of generic machine learning solution and do not exploit specific knowledge about the problem. In this paper, we follow a similar reasoning, but in the opposite direction. Instead of extending an existing supervised solution, we propose a set of carefully designed heuristics and similarity functions and apply supervision only to optimize such parameters for each particular dataset. As our experiments show, the result is a very effective, efficient and practical author name disambiguation method that can be used in many different scenarios.
  • Keywords
    data analysis; digital libraries; learning (artificial intelligence); author name disambiguation; dataset; digital libraries; domain-specific heuristics; generic machine learning solution; heuristics; similarity functions; supervised solutions; Electronic mail; Equations; Mathematical model; Measurement; Training; Training data; Vectors; Name Disambiguation; Supervised Methods;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Digital Libraries (JCDL), 2014 IEEE/ACM Joint Conference on
  • Conference_Location
    London
  • Type

    conf

  • DOI
    10.1109/JCDL.2014.6970165
  • Filename
    6970165