• DocumentCode
    3007099
  • Title

    Towards automatic clustering of similar pages in web applications

  • Author

    De Lucia, Andrea ; Risi, Michele ; Tortora, Genoveffa ; Scanniello, Giuseppe

  • Author_Institution
    Dipt. di Mat. e Inf., Univ. of Salerno, Fisciano, Italy
  • fYear
    2009
  • fDate
    25-26 Sept. 2009
  • Firstpage
    99
  • Lastpage
    108
  • Abstract
    In this paper, we propose an automatic approach to group web pages that are similar at the content level. The approach uses the Levenshtein string edit distance and Latent Semantic Indexing to compute page dissimilarity and then groups them using iteratively a Graph-Theoretic clustering algorithm. To automate the clustering process a prototype has been implemented and used to assess the proposed approach on three web sites.
  • Keywords
    Web sites; content-based retrieval; graph theory; indexing; pattern clustering; string matching; Web site; graph theoretic clustering algorithm; group Web page; latent semantic indexing; levenshtein string edit distance; Atmospheric measurements; Clustering algorithms; Navigation; Particle measurements; Prototypes; Web sites; Weight measurement;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Web Systems Evolution (WSE), 2009 11th IEEE International Symposium on
  • Conference_Location
    Edmonton, AB
  • ISSN
    1550-4441
  • Print_ISBN
    978-1-4244-5124-1
  • Type

    conf

  • DOI
    10.1109/WSE.2009.5631253
  • Filename
    5631253