DocumentCode :
173735
Title :
Automated method for extracting “citation sentences” from online biomedical articles using SVM-based text summarization technique
Author :
In Cheol Kim ; Le, Daniel X. ; Thoma, George R.
Author_Institution :
Lister Hill Nat. Center for Biomed. Commun., Bethesda, MD, USA
fYear :
2014
fDate :
5-8 Oct. 2014
Firstpage :
1991
Lastpage :
1996
Abstract :
Comment-on (CON), a MEDLINE citation field, indicates previously published articles commented on by authors of a given article expressing possibly complimentary or contradictory opinions. Our idea of identifying the CON list for a given article is to first extract all “citation sentences” from the body text, and then to recognize the sentences (“CON sentences”) among these that mention CON articles and to analyze the corresponding bibliographic data in the reference section. As a preprocessing step for identifying the CON list, this paper presents a general method for extracting “citation sentences” in the body text of online biomedical articles using a support vector machine (SVM)-based text summarization technique. Input feature vectors for the SVM are created by combining four types of features: 1) word statistics representing how differently a word occurs in “citation sentences” compared to other sentences, and the existence of 2) author names, 3) publication years, and 4) citation tags in a sentence. A rule-based post-processing step is also introduced to further reduce false negative errors in detecting “citation sentences”. Experiments on a set of online biomedical articles show that a SVM with a RBF achieves good performance overall in terms of accuracy, precision, recall, and F-measure rates. Our experiments also show that errors in extracting “citation sentences” cause a minor degradation of performance in identifying CON sentences, but can be improved through the proposed rule-based post-processing.
Keywords :
citation analysis; medical information systems; radial basis function networks; statistics; support vector machines; text analysis; word processing; CON articles; CON list; CON sentences; F-measure rate; MEDLINE citation field; RBF; SVM-based text summarization technique; automated citation sentence extraction method; body text; citation tags; comment-on citation field; input feature vectors; online biomedical articles; precision rate; recall rate; rule-based post-processing step; support vector machine text summarization technique; word statistics; Accuracy; Feature extraction; HTML; Pragmatics; Support vector machine classification; Vectors; “citation sentence” extraction; MEDLINE®; online biomedical documents; support vector machine;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Systems, Man and Cybernetics (SMC), 2014 IEEE International Conference on
Conference_Location :
San Diego, CA
Type :
conf
DOI :
10.1109/SMC.2014.6974213
Filename :
6974213
Link To Document :
بازگشت