• DocumentCode
    1809595
  • Title

    Information Extraction incorporating Paragraph Feature and Hidden Markov Model

  • Author

    Na, Liu ; Mingyu, Lu ; Huanling, Tang

  • fYear
    2007
  • fDate
    18-21 Sept. 2007
  • Firstpage
    953
  • Lastpage
    956
  • Abstract
    With the data of Internet continuous growth, information extraction has become the foundational and effective means to handling the quantity of text. This paper puts forward a method of information extraction that incorporating paragraph feature and hidden Markov model. The method takes paragraph instead of words as research object, paragraph is text sequence saved from web pages after preprocessed. Every paragraph is converted into special tokens, these tokens are the observation symbols of hidden Markov model. The whole experiments are carried out on EBM Web pages set. The information extracted includes title, author, affiliation and journal etc. The experimental results show that this method can improve precision and recall in some degree.
  • Keywords
    feature extraction; hidden Markov models; feature extraction; hidden Markov model; information extraction; paragraph feature; Automata; Computer science; Data mining; Feature extraction; Filling; Hidden Markov models; IP networks; Parallel processing; Spatial databases; Web pages;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Network and Parallel Computing Workshops, 2007. NPC Workshops. IFIP International Conference on
  • Conference_Location
    Liaoning
  • Print_ISBN
    978-0-7695-2943-1
  • Type

    conf

  • DOI
    10.1109/NPC.2007.109
  • Filename
    4351609