DocumentCode
2727276
Title
A Comparison of Dimensionality Reduction Techniques for Web Structure Mining
Author
Chikhi, Nacim Fateh ; Rothenburger, Bernard ; Aussenac-Gilles, Nathalie
Author_Institution
Univ. Paul Sabatier, Toulouse
fYear
2007
fDate
2-5 Nov. 2007
Firstpage
116
Lastpage
119
Abstract
In many domains, dimensionality reduction techniques have been shown to be very effective for elucidating the underlying semantics of data. Thus, in this paper we investigate the use of various dimensionality reduction techniques (DRTs) to extract the implicit structures hidden in the Web hyperlink connectivity. We apply and compare four DRTs, namely, principal component analysis (PCA), non-negative matrix factorization (NMF), independent component analysis (ICA) and random projection (RP). Experiments conducted on three datasets allow us to assert the following: NMF outperforms PCA and ICA in terms of stability and interpretability of the discovered structures; the well- known WebKb dataset used in a large number of works about the analysis of the hyperlink connectivity seems to be not adapted for this task and we suggest rather to use the recent Wikipedia dataset which is better suited.
Keywords
data mining; independent component analysis; matrix decomposition; principal component analysis; Web hyperlink connectivity; Web structure mining; Wikipedia dataset; dimensionality reduction; independent component analysis; non-negative matrix factorization; principal component analysis; random projection; Algorithm design and analysis; Data mining; Independent component analysis; Intelligent structures; Principal component analysis; Stability analysis; Topology; Web mining; Web pages; Wikipedia;
fLanguage
English
Publisher
ieee
Conference_Titel
Web Intelligence, IEEE/WIC/ACM International Conference on
Conference_Location
Fremont, CA
Print_ISBN
978-0-7695-3026-0
Type
conf
DOI
10.1109/WI.2007.86
Filename
4427077
Link To Document