• DocumentCode
    2727276
  • Title

    A Comparison of Dimensionality Reduction Techniques for Web Structure Mining

  • Author

    Chikhi, Nacim Fateh ; Rothenburger, Bernard ; Aussenac-Gilles, Nathalie

  • Author_Institution
    Univ. Paul Sabatier, Toulouse
  • fYear
    2007
  • fDate
    2-5 Nov. 2007
  • Firstpage
    116
  • Lastpage
    119
  • Abstract
    In many domains, dimensionality reduction techniques have been shown to be very effective for elucidating the underlying semantics of data. Thus, in this paper we investigate the use of various dimensionality reduction techniques (DRTs) to extract the implicit structures hidden in the Web hyperlink connectivity. We apply and compare four DRTs, namely, principal component analysis (PCA), non-negative matrix factorization (NMF), independent component analysis (ICA) and random projection (RP). Experiments conducted on three datasets allow us to assert the following: NMF outperforms PCA and ICA in terms of stability and interpretability of the discovered structures; the well- known WebKb dataset used in a large number of works about the analysis of the hyperlink connectivity seems to be not adapted for this task and we suggest rather to use the recent Wikipedia dataset which is better suited.
  • Keywords
    data mining; independent component analysis; matrix decomposition; principal component analysis; Web hyperlink connectivity; Web structure mining; Wikipedia dataset; dimensionality reduction; independent component analysis; non-negative matrix factorization; principal component analysis; random projection; Algorithm design and analysis; Data mining; Independent component analysis; Intelligent structures; Principal component analysis; Stability analysis; Topology; Web mining; Web pages; Wikipedia;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Web Intelligence, IEEE/WIC/ACM International Conference on
  • Conference_Location
    Fremont, CA
  • Print_ISBN
    978-0-7695-3026-0
  • Type

    conf

  • DOI
    10.1109/WI.2007.86
  • Filename
    4427077