DocumentCode :
795901
Title :
Visualizing the structure of Web communities based on data acquired from a search engine
Author :
Murata, Tsuyoshi
Author_Institution :
Nat. Inst. of Informatics, Tokyo, Japan
Volume :
50
Issue :
5
fYear :
2003
Firstpage :
860
Lastpage :
866
Abstract :
Discovery of Web communities, groups of Web pages sharing common interests, is important for assisting users´ information retrieval from the Web. This paper describes a method for visualizing Web communities and their internal structures. visualization of Web communities in the form of graphs enables users to access related pages easily, and it often reflects the characteristics of the Web communities. Since related Web pages are often co-referred from the same Web page, the number of co-occurrences of references in a search engine is used for measuring the relation among pages. Two URLs are given to a search engine as keywords, and the value of the number of pages searched from both URLs divided by the number of pages searched from either URL, which is called the Jaccard coefficient, is calculated as the criteria for evaluating the relation between the two URLs. The value is used for determining the length of an edge in a graph so that vertices of related pages will be located close to each other. Our visualization system based on the method succeeds in clarifying various genres of Web communities, although the system does not interpret the contents of the pages. The method of calculating the Jaccard coefficient is easily processed by computer systems, and it is suitable for visualization using the data acquired from a search engine.
Keywords :
Web sites; data visualisation; search engines; Jaccard coefficient; URL; Web communities structure visualisation; Web pages; information retrieval; internal structures; search engine; Data visualization; Embryo; Helium; Informatics; Information retrieval; Internet; Linux; Search engines; Uniform resource locators; Web pages;
fLanguage :
English
Journal_Title :
Industrial Electronics, IEEE Transactions on
Publisher :
ieee
ISSN :
0278-0046
Type :
jour
DOI :
10.1109/TIE.2003.817486
Filename :
1234432
Link To Document :
بازگشت