DocumentCode
3765457
Title
A webpage information extraction method based on game theory
Author
Bohai Yu;Zhang Xia;Zhengyou Xia
Author_Institution
College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, China
fYear
2015
fDate
7/1/2015 12:00:00 AM
Firstpage
35
Lastpage
39
Abstract
As web2.0 developing many websites provide information on its own CMS (content management system) especially for news websites. How to extract information from different webpage is becoming more and more popular to research. Many researchers have proposed plenty of methods that can extract valid content adaptively. In this paper we have proposed a method based on game theory to efficiently extract the main text from webpage. We will find the target label by using label game. Our method is consisted of two steps: (a). Filtering the script and style tags in the Webpage, and then dividing entire html page into many blocks by using div tag; (b). extracting features from the blocks and find the Nash equilibrium from game theory matrix. By making plenty of experiments on some websites, it verifies that our model based on game theory is valid and better.
Publisher
iet
Conference_Titel
Smart and Sustainable City and Big Data (ICSSC), 2015 International Conference on
Type
conf
DOI
10.1049/cp.2015.0252
Filename
7446435
Link To Document