DocumentCode :
1655252
Title :
Extracting Dense Bipartite Graph Block in Web Community Discovery
Author :
Nan Yang
Author_Institution :
Inf. Sch., Renmin Univ. of China, Beijing, China
fYear :
2013
Firstpage :
159
Lastpage :
166
Abstract :
Community is a very important structure in the Web. The discovery of these communities is a challenging task. In many researches, it is an effective way of exhaustively extracting dense sub graphs to find communities. The pioneer works in[1], [2] uses a CBG(Complete Bipartite Graph) as a signature of a community core and discovers many implicit communities. However, the CBG is too strict and it excludes many possible community structures. Therefore, instead of CBG, DBG(Dense Bipartite Graph) is chosen as a signature. For instance, Reddy et al. [3] proposed degree-based (a, ß)density, Gibson et al. [4] and Dourisboure et al. [5] use a ratio-based ?-dense function to qualify the density of a DBG. In this paper, we analyze two previous density measurements and point out that in low density the structure of bipartite graph may be unreasonable because of the existence of cutting nodes. For this reason, we introduce DBGB(Dense Bipartite Graph Block). Subsequently, we employ two-step expansion to construct bipartite graph which decreases the number of unnecessary nodes and edges. In order to get optimal bipartite structure, we propose max DBGB and design an extracting algorithm. The new method is tested under 4 datasets collected by a Web crawler and dense cores have been extracted. We check 200 random sampling cores and 89 percent of them make sense. Meanwhile, we apply Dourisboure´s method on one of the datasets with different scale and the cores extracted contain many cutting nodes. Consequently, the experiment results show that our method is effective.
Keywords :
Internet; data mining; directed graphs; information retrieval; random processes; sampling methods; social networking (online); Web community discovery; Web crawler; cutting nodes; dense bipartite graph block extraction; dense cores; density measurements; extracting algorithm; max DBGB; optimal bipartite structure; random sampling cores; two-step expansion; Algorithm design and analysis; Bipartite graph; Communities; Density measurement; Educational institutions; Fans; Organizations; dense bipartite graph; link analysis; web communities;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Web Information System and Application Conference (WISA), 2013 10th
Conference_Location :
Yangzhou
Print_ISBN :
978-1-4799-3218-4
Type :
conf
DOI :
10.1109/WISA.2013.38
Filename :
6778629
Link To Document :
بازگشت