• DocumentCode
    2011720
  • Title

    CRF-based Bibliography Extraction from Reference Strings Focusing on Various Token Granularities

  • Author

    Ohta, Manabu ; Arauchi, Daiki ; Takasu, Atsuhiro ; Adachi, Jun

  • Author_Institution
    Okayama Univ., Okayama, Japan
  • fYear
    2012
  • fDate
    27-29 March 2012
  • Firstpage
    276
  • Lastpage
    281
  • Abstract
    The references of academic articles include important bibliographic elements such as authors´ names and article titles. Automatic extraction of these elements is useful because they can be used for various purposes, including searching. In this paper, a method for automatically extracting bibliographic elements from the text of reference strings is proposed. The proposed method assigns bibliographic labels to reference strings by using linguistic information and conditional random fields. Experimental results indicated that the extraction accuracies of major bibliographies were more than 96%.
  • Keywords
    bibliographies; citation analysis; probability; random processes; text analysis; CRF-based bibliography extraction; academic article references; article title; author name; automatic bibliographic element extraction; bibliographic label assignment; conditional probability; conditional random field; label sequence; linguistic information; reference string text; searching; token granularity; Accuracy; Bibliographies; Data mining; Data models; Digital signal processing; Hidden Markov models; Labeling; bibliography extraction; conditional random field (CRF); delimiter; reference; tokenization;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Document Analysis Systems (DAS), 2012 10th IAPR International Workshop on
  • Conference_Location
    Gold Cost, QLD
  • Print_ISBN
    978-1-4673-0868-7
  • Type

    conf

  • DOI
    10.1109/DAS.2012.28
  • Filename
    6195378