A webpage information extraction method based on game theory

Author

Bohai Yu;Zhang Xia;Zhengyou Xia

Author_Institution

College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, China

fYear

2015

fDate

7/1/2015 12:00:00 AM

Firstpage

Lastpage

Abstract

As web2.0 developing many websites provide information on its own CMS (content management system) especially for news websites. How to extract information from different webpage is becoming more and more popular to research. Many researchers have proposed plenty of methods that can extract valid content adaptively. In this paper we have proposed a method based on game theory to efficiently extract the main text from webpage. We will find the target label by using label game. Our method is consisted of two steps: (a). Filtering the script and style tags in the Webpage, and then dividing entire html page into many blocks by using div tag; (b). extracting features from the blocks and find the Nash equilibrium from game theory matrix. By making plenty of experiments on some websites, it verifies that our model based on game theory is valid and better.

Publisher

iet

Conference_Titel

Smart and Sustainable City and Big Data (ICSSC), 2015 International Conference on

Type

conf

DOI

10.1049/cp.2015.0252

Filename

7446435

Link To Document

https://search.isc.ac/dl/search/defaultta.aspx?DTC=49&DC=3765457