Title :
Beyond the Web Graph: Mining the Information Architecture of the WWW with Navigation Structure Graphs
Author :
Keller, Matthias ; Nussbaumer, Martin
Author_Institution :
Steinbuch Centre for Comput. (SCC), Karlsruhe Inst. of Technol. (KIT), Karlsruhe, Germany
Abstract :
Large Web sites contain a plethora of different menus and navigation aids, which implement systems of content organization as hierarchies, linear structures or matrices. Humans are able to decode the fine-grained content organization because they are aware of the different access methods provided by navigation systems and understand the higher-level information architecture. In contrast, current methods of link analysis cannot extract such a detailed model of the information architecture and are not able to recognize site boundaries and content hierarchies the way humans do. In this paper present a new approach of mining navigation systems that increases the precision of Web structure mining. Instead of analyzing the complete Web graph spanned by pages and hyperlinks, sub graphs called Navigation Structure Graphs (NSGs) are analyzed. A NSG represents the hyperlinks belonging to a certain navigation system. We demonstrate the capabilities of NSGs for analyzing the organization of Web sites and present our research on mining NSGs.
Keywords :
Web sites; data mining; graph theory; WWW; Web graph; Web sites; Web structure mining; content organization; information architecture mining; link analysis; navigation structure graph; navigation system mining; Cascading style sheets; Data mining; Humans; Information architecture; Navigation; Organizations; Visualization; Web graph; Web structure mining; hierarchy extraction;
Conference_Titel :
Emerging Intelligent Data and Web Technologies (EIDWT), 2011 International Conference on
Conference_Location :
Tirana
Print_ISBN :
978-1-4577-0840-4
DOI :
10.1109/EIDWT.2011.23