DocumentCode :
3253044
Title :
Algorithm for detecting dynamic webpage and its importance
Author :
Sultania, A.K.
Author_Institution :
Freescale Semicond. Pvt Ltd., Noida, India
fYear :
2012
fDate :
21-22 Dec. 2012
Firstpage :
257
Lastpage :
259
Abstract :
During web search using crawling, indexing, relevance it is found that there exist many duplicate web-pages with different URLs, these URLs are normalized when used by crawler. Many web-pages are found to be dynamic, for which different web contents are found with the same URL, during different instances of searches. In this paper, we discuss about the necessity to detect these dynamic web-pages and propose an algorithm to identify this dynamism. The normalization of URLs can be done using various methods explained in [1], [2] & [7], or using the DUST algorithm [3] but it is necessary first to identify the dynamic web-page before normalization. After implementing the proposed algorithm with DUST rule it is expected that the detection rate of dynamic web-pages improves, resulting in reduction of the time spent for crawling, indexing etc.
Keywords :
Web sites; indexing; information retrieval; DUST algorithm; URL; Web search; crawling; duplicate Web-pages; dynamic Webpage detection; indexing; Conferences; Heuristic algorithms; Indexing; Radar tracking; Search engines; Web search; World Wide Web; Search engine; URL normalization; Webpage de-duplication; duplicate detection; dynamic webpage;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Radar, Communication and Computing (ICRCC), 2012 International Conference on
Conference_Location :
Tiruvannamalai
Print_ISBN :
978-1-4673-2756-5
Type :
conf
DOI :
10.1109/ICRCC.2012.6450590
Filename :
6450590
Link To Document :
بازگشت