Title :
Study on the elimination of duplicated multimedia webpages
Author_Institution :
Sch. of Commun., Shandong Normal Univ., Jinan, China
Abstract :
There are many duplicated web pages in the multimedia web resources, and elimination of the duplicates can remove the duplicated pages, reduce storage costs and improve search engine performance. Based on analysis of the classic algorithm of eliminating the duplicates, his article raises an improved algorithm for judging web page text repetition. The new algorithm runs the elimination process on the basis of webpage contents which are used as the vector characteristics in the comparison with the webpages´ approximation, and analyzes how to capture the web page´s theme. Hence, we can make a multidimensional improvement in the elimination of the duplicates of multimedia webpages.
Keywords :
Web sites; information retrieval; multimedia systems; search engines; text analysis; Web page text repetition; Web page theme capturing; Webpage approximation; classic algorithm; duplicated multimedia Webpage elimination process; duplicated page removal; multimedia Web resources; search engine performance imorovement; storage cost reduction; vector characteristics; Accuracy; Algorithm design and analysis; Approximation algorithms; Feature extraction; Multimedia communication; Search engines; Web pages; Duplicates checking Algorithm; Feature code; Multimedia webpages; Topic search;
Conference_Titel :
Systems and Informatics (ICSAI), 2012 International Conference on
Conference_Location :
Yantai
Print_ISBN :
978-1-4673-0198-5
DOI :
10.1109/ICSAI.2012.6223520