• DocumentCode
    2484243
  • Title

    Clustering of short commercial documents for the web

  • Author

    Carullo, Moreno ; Binaghi, Elisabetta ; Gallo, Ignazio ; Lamberti, Nicola

  • Author_Institution
    Dipt. di Inf. e Comun., Univ. degli Studi dell´´Insubria, Varese
  • fYear
    2008
  • fDate
    8-11 Dec. 2008
  • Firstpage
    1
  • Lastpage
    4
  • Abstract
    Document clustering techniques have been applied in several areas, with the Web as one of the most recent and influent. Both general-purpose and text-oriented techniques exist and can be used to cluster a collection of documents in many ways. In this work we propose an online, single-pass document clustering model that can be combined with a variety of text-oriented similarity measures. An experimental evaluation of the proposed model was conducted in the e-commerce domain. Performances were measured using a clustering-oriented metric based on F-Measure and compared with those obtained by other well-known approaches.
  • Keywords
    Internet; pattern clustering; text analysis; Web search; general-purpose technique; single-pass document clustering; text-oriented technique; Algorithm design and analysis; Clustering algorithms; Clustering methods; Electronic commerce; Encoding; Internet; Particle measurements; Performance evaluation; Text analysis; Web search;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Pattern Recognition, 2008. ICPR 2008. 19th International Conference on
  • Conference_Location
    Tampa, FL
  • ISSN
    1051-4651
  • Print_ISBN
    978-1-4244-2174-9
  • Electronic_ISBN
    1051-4651
  • Type

    conf

  • DOI
    10.1109/ICPR.2008.4761554
  • Filename
    4761554