• DocumentCode
    644014
  • Title

    Word-level information extraction from science and technology announcements corpus based on CRF

  • Author

    Yushu Cao ; Jun Wang ; Lei Li

  • Author_Institution
    Sch. of Eng. & Appl. Sci., Univ. of Pennsylvania, Philadelphia, PA, USA
  • Volume
    03
  • fYear
    2012
  • fDate
    Oct. 30 2012-Nov. 1 2012
  • Firstpage
    1529
  • Lastpage
    1533
  • Abstract
    Conditional Random Field (CRF) has been applied widely in information extraction and natural language processing. However, according to corpus types, it has not been made much use of on corpus about science and technology declarations. In this paper, we extract word-level information from amounts of science and technology announcements corpus, and analyze the performance of CRF, comparing with Naïve Bayes as a baseline. According to our experiments, we show that CRF has much high precision except for a few unknown data. Also, Naïve Bayes model is satisfactory in closed domains, but it always makes mistakes when the data belong to a less weighted class.
  • Keywords
    information resources; natural language processing; scientific information systems; text analysis; CRF; closed domains; conditional random field; naïve Bayes; natural language processing; science and technology announcements corpus; science and technology declarations; word-level information; word-level information extraction; Data mining; Data models; Hidden Markov models; Information retrieval; Niobium; Testing; Training; conditional random field; information extraction; naïve bayes; science and technology corpus; word-level;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Cloud Computing and Intelligent Systems (CCIS), 2012 IEEE 2nd International Conference on
  • Conference_Location
    Hangzhou
  • Print_ISBN
    978-1-4673-1855-6
  • Type

    conf

  • DOI
    10.1109/CCIS.2012.6664640
  • Filename
    6664640