Title :
Visual similarity comparison for Web page retrieval
Author :
Takama, Yasufumi ; Mitsuhashi, Noriaki
Author_Institution :
Tokyo Metropolitan Univ., Japan
Abstract :
A comparison method for Web pages in terms of visual similarity is proposed. Conventional Web information retrieval/gathering systems, such as search engines, extract keywords from HTML source files, based on which the similarity between pages is calculated. The extracted keywords are considered as semantic features representing the contents of Web pages. On the other hand, visual feature of Web pages is as important as semantic feature, because HTML is designed for visualizing a Web page in understandable manner for humans. The proposed method compares the layouts of Web pages based on image processing and graph matching. The experimental results show that the accuracy of layout analysis is 91.6% in average, and the visual similarity calculated by the proposed method is closer to the visual judgment by test subjects than color-based comparison method.
Keywords :
Web sites; data visualisation; hypermedia markup languages; image matching; image processing; information retrieval; search engines; HTML source file; Web gathering system; Web information retrieval; Web page retrieval; Web page visualization; graph matching; image processing; keyword extraction; search engine; visual similarity; Data mining; HTML; Humans; Image processing; Information retrieval; Search engines; Testing; Usability; Visualization; Web pages;
Conference_Titel :
Web Intelligence, 2005. Proceedings. The 2005 IEEE/WIC/ACM International Conference on
Print_ISBN :
0-7695-2415-X
DOI :
10.1109/WI.2005.157