Title :
CRF-based Bibliography Extraction from Reference Strings Focusing on Various Token Granularities
Author :
Ohta, Manabu ; Arauchi, Daiki ; Takasu, Atsuhiro ; Adachi, Jun
Author_Institution :
Okayama Univ., Okayama, Japan
Abstract :
The references of academic articles include important bibliographic elements such as authors´ names and article titles. Automatic extraction of these elements is useful because they can be used for various purposes, including searching. In this paper, a method for automatically extracting bibliographic elements from the text of reference strings is proposed. The proposed method assigns bibliographic labels to reference strings by using linguistic information and conditional random fields. Experimental results indicated that the extraction accuracies of major bibliographies were more than 96%.
Keywords :
bibliographies; citation analysis; probability; random processes; text analysis; CRF-based bibliography extraction; academic article references; article title; author name; automatic bibliographic element extraction; bibliographic label assignment; conditional probability; conditional random field; label sequence; linguistic information; reference string text; searching; token granularity; Accuracy; Bibliographies; Data mining; Data models; Digital signal processing; Hidden Markov models; Labeling; bibliography extraction; conditional random field (CRF); delimiter; reference; tokenization;
Conference_Titel :
Document Analysis Systems (DAS), 2012 10th IAPR International Workshop on
Conference_Location :
Gold Cost, QLD
Print_ISBN :
978-1-4673-0868-7
DOI :
10.1109/DAS.2012.28