Title :
Similarity Computation of Web Pages of Focused Crawler
Author :
Yu, Huo Ling ; Bingwu, Liu ; Fang, Yan
Author_Institution :
Sch. of Inf., Beijing Wuzi Univ., Beijing, China
Abstract :
Due to the dynamic nature of the Web, it becomes harder to find relevant and recent information. More and more people begin to use focused crawler to get information in their special fields today. However, the Similarity Computation based text is incompetent, because the page consists of not only text but also multimedia contents, such as image, audio, video and so on. In the field of the focused crawler the page structure plays a key role in the similarity computation too. In this paper we introduce a new method to have similarity computation according the page structure and content which can make web page similarity computation exactly and crawling efficiently which will bring benefits for Web analysis and get information easily for users.
Keywords :
Internet; Web analysis; Web page similarity computation; page content; page structure; similarity computation; Computational modeling; Crawlers; HTML; Head; Indexing; Shape; Web pages; Content Similarity; Focused Crawler; Page Structure; Similarity computation;
Conference_Titel :
Information Technology and Applications (IFITA), 2010 International Forum on
Conference_Location :
Kunming
Print_ISBN :
978-1-4244-7621-3
Electronic_ISBN :
978-1-4244-7622-0
DOI :
10.1109/IFITA.2010.308