DocumentCode :
2396965
Title :
Study on the elimination of duplicated multimedia webpages
Author :
Yang, Xiaojuan
Author_Institution :
Sch. of Commun., Shandong Normal Univ., Jinan, China
fYear :
2012
fDate :
19-20 May 2012
Firstpage :
2325
Lastpage :
2328
Abstract :
There are many duplicated web pages in the multimedia web resources, and elimination of the duplicates can remove the duplicated pages, reduce storage costs and improve search engine performance. Based on analysis of the classic algorithm of eliminating the duplicates, his article raises an improved algorithm for judging web page text repetition. The new algorithm runs the elimination process on the basis of webpage contents which are used as the vector characteristics in the comparison with the webpages´ approximation, and analyzes how to capture the web page´s theme. Hence, we can make a multidimensional improvement in the elimination of the duplicates of multimedia webpages.
Keywords :
Web sites; information retrieval; multimedia systems; search engines; text analysis; Web page text repetition; Web page theme capturing; Webpage approximation; classic algorithm; duplicated multimedia Webpage elimination process; duplicated page removal; multimedia Web resources; search engine performance imorovement; storage cost reduction; vector characteristics; Accuracy; Algorithm design and analysis; Approximation algorithms; Feature extraction; Multimedia communication; Search engines; Web pages; Duplicates checking Algorithm; Feature code; Multimedia webpages; Topic search;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Systems and Informatics (ICSAI), 2012 International Conference on
Conference_Location :
Yantai
Print_ISBN :
978-1-4673-0198-5
Type :
conf
DOI :
10.1109/ICSAI.2012.6223520
Filename :
6223520
Link To Document :
بازگشت