DocumentCode :
3765457
Title :
A webpage information extraction method based on game theory
Author :
Bohai Yu;Zhang Xia;Zhengyou Xia
Author_Institution :
College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, China
fYear :
2015
fDate :
7/1/2015 12:00:00 AM
Firstpage :
35
Lastpage :
39
Abstract :
As web2.0 developing many websites provide information on its own CMS (content management system) especially for news websites. How to extract information from different webpage is becoming more and more popular to research. Many researchers have proposed plenty of methods that can extract valid content adaptively. In this paper we have proposed a method based on game theory to efficiently extract the main text from webpage. We will find the target label by using label game. Our method is consisted of two steps: (a). Filtering the script and style tags in the Webpage, and then dividing entire html page into many blocks by using div tag; (b). extracting features from the blocks and find the Nash equilibrium from game theory matrix. By making plenty of experiments on some websites, it verifies that our model based on game theory is valid and better.
Publisher :
iet
Conference_Titel :
Smart and Sustainable City and Big Data (ICSSC), 2015 International Conference on
Type :
conf
DOI :
10.1049/cp.2015.0252
Filename :
7446435
Link To Document :
بازگشت