DocumentCode :
2069898
Title :
Web Page´s Blocks Based Topical Crawler
Author :
Zhang, Weifeng ; Xu, Baowen ; Lu, Hong
Author_Institution :
Coll. of Comput., Nanjing Univ. of Posts & Telecommun., Nanjing, China
fYear :
2008
fDate :
18-19 Dec. 2008
Firstpage :
44
Lastpage :
49
Abstract :
Link context has been widely used in information retrieval and classification. In topical crawlers or vertical crawlers, the link contexts are used to forecast whether the links are related to topics. The context of a link or link context usually includes the anchor text of the link, the whole web page text or the words in the fixed scope near the link. The entire text of the page often contains too many themes, anchor text is too simple, and the scope of fixed windows is not easy to determine. In this paper, we propose to decide the scope of link context by the web page block technology. The links in the same block are more closely related. The corner classification based neural network is used to represent and filter the topics. Our experiments show that web crawlers using web page block based link context have better accuracy, and that the corner classification neural network is suitable for representing and filtering topics.
Keywords :
Web sites; neural nets; Web crawlers; Web page block; corner classification; link context; neural network; topical crawler; Artificial neural networks; Biological neural networks; Crawlers; Educational institutions; Information filtering; Information filters; Information retrieval; Neural networks; Search engines; Web pages; crawler; topic; web page block;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Service-Oriented System Engineering, 2008. SOSE '08. IEEE International Symposium on
Conference_Location :
Jhongli
Print_ISBN :
978-0-7695-3499-2
Electronic_ISBN :
978-0-7695-3499-2
Type :
conf
DOI :
10.1109/SOSE.2008.10
Filename :
4730461
Link To Document :
بازگشت