DocumentCode
2011720
Title
CRF-based Bibliography Extraction from Reference Strings Focusing on Various Token Granularities
Author
Ohta, Manabu ; Arauchi, Daiki ; Takasu, Atsuhiro ; Adachi, Jun
Author_Institution
Okayama Univ., Okayama, Japan
fYear
2012
fDate
27-29 March 2012
Firstpage
276
Lastpage
281
Abstract
The references of academic articles include important bibliographic elements such as authors´ names and article titles. Automatic extraction of these elements is useful because they can be used for various purposes, including searching. In this paper, a method for automatically extracting bibliographic elements from the text of reference strings is proposed. The proposed method assigns bibliographic labels to reference strings by using linguistic information and conditional random fields. Experimental results indicated that the extraction accuracies of major bibliographies were more than 96%.
Keywords
bibliographies; citation analysis; probability; random processes; text analysis; CRF-based bibliography extraction; academic article references; article title; author name; automatic bibliographic element extraction; bibliographic label assignment; conditional probability; conditional random field; label sequence; linguistic information; reference string text; searching; token granularity; Accuracy; Bibliographies; Data mining; Data models; Digital signal processing; Hidden Markov models; Labeling; bibliography extraction; conditional random field (CRF); delimiter; reference; tokenization;
fLanguage
English
Publisher
ieee
Conference_Titel
Document Analysis Systems (DAS), 2012 10th IAPR International Workshop on
Conference_Location
Gold Cost, QLD
Print_ISBN
978-1-4673-0868-7
Type
conf
DOI
10.1109/DAS.2012.28
Filename
6195378
Link To Document