• DocumentCode
    3038595
  • Title

    Extraction technology of blog comments based on functional semantic units

  • Author

    Chun-long, Fan ; Hui, Meng

  • Author_Institution
    Dept. of Comput., Shenyang Aerosp. Univ., ShenYang, China
  • Volume
    3
  • fYear
    2012
  • fDate
    25-27 May 2012
  • Firstpage
    422
  • Lastpage
    426
  • Abstract
    Blog is an important kind of network information resources, extracting its comments information is essential for the researches of public opinion analysis and so on. In this paper we summarized the prevalent extraction algorithms of blog comments and described how to use page structure in information extraction. The indicator phrases such as "Home" have clear semantics and functional indication when people understand the web pages. The indicatior phrases are known as Functional Semantic Units (FSU). Base on the characteristic of FSU We propose a kind of comment information extracting model, and present a detailed model of thinking and implementation process. Such as the page structure linearized, functional semantic units are distinguished, main text are recognized and comments extraction algorithm etc. Finally, the experiments prove that the comment information extracting model is effective and better identification results.
  • Keywords
    Web sites; information retrieval; FSU; Web pages; blog comments; comment information extracting model; comments extraction algorithm; extraction technology; functional semantic units; network information resources; page structure; public opinion analysis; Arrays; Blogs; Data mining; Feature extraction; Layout; Semantics; Web pages; blog; comment; functional semantic unit; information extraction;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer Science and Automation Engineering (CSAE), 2012 IEEE International Conference on
  • Conference_Location
    Zhangjiajie
  • Print_ISBN
    978-1-4673-0088-9
  • Type

    conf

  • DOI
    10.1109/CSAE.2012.6272985
  • Filename
    6272985