• DocumentCode
    3765457
  • Title

    A webpage information extraction method based on game theory

  • Author

    Bohai Yu;Zhang Xia;Zhengyou Xia

  • Author_Institution
    College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, China
  • fYear
    2015
  • fDate
    7/1/2015 12:00:00 AM
  • Firstpage
    35
  • Lastpage
    39
  • Abstract
    As web2.0 developing many websites provide information on its own CMS (content management system) especially for news websites. How to extract information from different webpage is becoming more and more popular to research. Many researchers have proposed plenty of methods that can extract valid content adaptively. In this paper we have proposed a method based on game theory to efficiently extract the main text from webpage. We will find the target label by using label game. Our method is consisted of two steps: (a). Filtering the script and style tags in the Webpage, and then dividing entire html page into many blocks by using div tag; (b). extracting features from the blocks and find the Nash equilibrium from game theory matrix. By making plenty of experiments on some websites, it verifies that our model based on game theory is valid and better.
  • Publisher
    iet
  • Conference_Titel
    Smart and Sustainable City and Big Data (ICSSC), 2015 International Conference on
  • Type

    conf

  • DOI
    10.1049/cp.2015.0252
  • Filename
    7446435