Title :
Clustering of short commercial documents for the web
Author :
Carullo, Moreno ; Binaghi, Elisabetta ; Gallo, Ignazio ; Lamberti, Nicola
Author_Institution :
Dipt. di Inf. e Comun., Univ. degli Studi dell´´Insubria, Varese
Abstract :
Document clustering techniques have been applied in several areas, with the Web as one of the most recent and influent. Both general-purpose and text-oriented techniques exist and can be used to cluster a collection of documents in many ways. In this work we propose an online, single-pass document clustering model that can be combined with a variety of text-oriented similarity measures. An experimental evaluation of the proposed model was conducted in the e-commerce domain. Performances were measured using a clustering-oriented metric based on F-Measure and compared with those obtained by other well-known approaches.
Keywords :
Internet; pattern clustering; text analysis; Web search; general-purpose technique; single-pass document clustering; text-oriented technique; Algorithm design and analysis; Clustering algorithms; Clustering methods; Electronic commerce; Encoding; Internet; Particle measurements; Performance evaluation; Text analysis; Web search;
Conference_Titel :
Pattern Recognition, 2008. ICPR 2008. 19th International Conference on
Conference_Location :
Tampa, FL
Print_ISBN :
978-1-4244-2174-9
Electronic_ISBN :
1051-4651
DOI :
10.1109/ICPR.2008.4761554