Title :
Semantics-Based Extraction of Webpage Main Text
Author :
Fengjiao, Han ; Zhurong, Zhou
Author_Institution :
Coll. of Comput. & Inf. Sci., Southwest Univ., Chongqing, China
Abstract :
Extraction of web page main text is one of the most efficient methods to improve search engine. In the traditional method, the extraction of the web page main text use the similarity of DOM sub-tree as a end condition for the DOM tree traversing, while its speed is unsatisfactory on such a complex web page structure. Thus, to raise the traverse speed and accuracy of DOM sub-tree effectively, we propose a method which is Semantics-based Extraction of Web page Main text.
Keywords :
Web sites; search engines; semantic Web; text analysis; DOM sub-tree; DOM tree traversing; Webpage main text; complex Webpage structure; search engine; semantics-based extraction; Accuracy; Computers; Data mining; Educational institutions; HTML; Navigation; Semantics; Extraction; Semantics; Webpage;
Conference_Titel :
Semantics, Knowledge and Grids (SKG), 2012 Eighth International Conference on
Conference_Location :
Beijing
Print_ISBN :
978-1-4673-2561-5
DOI :
10.1109/SKG.2012.47