• DocumentCode
    3485809
  • Title

    Identification of Investigator Name Zones Using SVM Classifiers and Heuristic Rules

  • Author

    Jongwoo Kim ; Le, Daniel X. ; Thoma, George R.

  • Author_Institution
    Nat. Libr. of Med., Bethesda, MD, USA
  • fYear
    2013
  • fDate
    25-28 Aug. 2013
  • Firstpage
    140
  • Lastpage
    144
  • Abstract
    The research reported in biomedical articles often involves large numbers of investigators at different institutions. To properly credit these investigators, an article\´s authors frequently name them together in some part of the article. These Investigator Names (IN) now constitute a required field in the MEDLINE® citation for the article. The automated extraction of these names is implemented in a system developed by a research group at the U.S. National Library of Medicine, consisting of three modules based on Support Vector Machine (SVM) classifiers and heuristic rules. The SVM classifiers label text blocks ("zones") that possibly contain Investigator Names, and the heuristic rules identify the actual zones. We collect eleven sets of word lists to train and test the classifiers, each set containing 100 to 56,000 words. Experimental results on online biomedical articles show a Precision of 0.90, 0.95 Recall, 0.92 F-Measure, and 0.99 Accuracy.
  • Keywords
    bioinformatics; citation analysis; pattern classification; support vector machines; text analysis; MEDLINE citation; SVM classifier label text blocks; SVM classifiers; US National Library of Medicine; article authors; automated extraction; classifier testing; classifier training; heuristic rules; investigator name zone identification; online biomedical articles; support vector machine classifiers; word lists; Accuracy; Classification algorithms; Data mining; Labeling; Libraries; Merging; Support vector machines; Investigator Names; MEDLINE; Support Vector Machine; bibliographic information; heuristic rules; labeling;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Document Analysis and Recognition (ICDAR), 2013 12th International Conference on
  • Conference_Location
    Washington, DC
  • ISSN
    1520-5363
  • Type

    conf

  • DOI
    10.1109/ICDAR.2013.35
  • Filename
    6628600