DocumentCode
3485809
Title
Identification of Investigator Name Zones Using SVM Classifiers and Heuristic Rules
Author
Jongwoo Kim ; Le, Daniel X. ; Thoma, George R.
Author_Institution
Nat. Libr. of Med., Bethesda, MD, USA
fYear
2013
fDate
25-28 Aug. 2013
Firstpage
140
Lastpage
144
Abstract
The research reported in biomedical articles often involves large numbers of investigators at different institutions. To properly credit these investigators, an article\´s authors frequently name them together in some part of the article. These Investigator Names (IN) now constitute a required field in the MEDLINE® citation for the article. The automated extraction of these names is implemented in a system developed by a research group at the U.S. National Library of Medicine, consisting of three modules based on Support Vector Machine (SVM) classifiers and heuristic rules. The SVM classifiers label text blocks ("zones") that possibly contain Investigator Names, and the heuristic rules identify the actual zones. We collect eleven sets of word lists to train and test the classifiers, each set containing 100 to 56,000 words. Experimental results on online biomedical articles show a Precision of 0.90, 0.95 Recall, 0.92 F-Measure, and 0.99 Accuracy.
Keywords
bioinformatics; citation analysis; pattern classification; support vector machines; text analysis; MEDLINE citation; SVM classifier label text blocks; SVM classifiers; US National Library of Medicine; article authors; automated extraction; classifier testing; classifier training; heuristic rules; investigator name zone identification; online biomedical articles; support vector machine classifiers; word lists; Accuracy; Classification algorithms; Data mining; Labeling; Libraries; Merging; Support vector machines; Investigator Names; MEDLINE; Support Vector Machine; bibliographic information; heuristic rules; labeling;
fLanguage
English
Publisher
ieee
Conference_Titel
Document Analysis and Recognition (ICDAR), 2013 12th International Conference on
Conference_Location
Washington, DC
ISSN
1520-5363
Type
conf
DOI
10.1109/ICDAR.2013.35
Filename
6628600
Link To Document