DocumentCode :
525540
Title :
Notice of Violation of IEEE Publication Principles
Improving heterogeneous data clustering by using metadata and compression algorithms
Author :
Cernian, Alexandra ; Carstoiu, Dorin ; Sgarciu, Valentin
Author_Institution :
Fac. of Autom. Control & Comput. Sci., Politeh. Univ. of Bucharest, Bucharest, Romania
fYear :
2010
fDate :
24-26 June 2010
Firstpage :
169
Lastpage :
173
Abstract :
Notice of Violation of IEEE Publication Principles

"Improving Heterogeneous Data Clustering by Using Metadata and Compression Algorithms"
by Alexandra Cernian, Dorin Carstoiu, Valentin Sgarciu,
in the Proceedings of the 2010 Roedunet International Conference (RoEduNet),June 2010, pp.169-173

After careful and considered review of the content and authorship of this paper by a duly constituted expert committee, this paper has been found to be in violation of IEEE\´s Publication Principles.

This paper contains portions of text from the paper(s) cited below. A credit notice is used, but due to the absence of quotation marks or offset text, copied material is not clearly referenced or specifically identified.

"Etude des Methodes de Classification par Compression"
by Tudor Basarab IONESCU,
published in Rapport interne 2005-06-28-DI-FB
http://wwwdi.supelec.fr/fb/download/Articles/Rapport_2005-06-28-DI-FB.pdf

Nowadays, we have to deal with a large quantity of unstructured, heterogeneous data, produced by an increasing number of sources. Clustering heterogeneous data is essential to getting structured information in response to user queries. In this paper, we assess the results of a new clustering technique - clustering by compression - when applied to metadata associated with heterogeneous sets of data. The clustering by compression procedure is based on a parameter-free, universal, similarity distance, the normalized compression distance or NCD, computed from the lengths of compressed data files (singly and in pair-wise concatenation). Experimental results show that using metadata could improve the average clustering performances with about 20% over clustering the same sample data set without using metadata.
Keywords :
data compression; meta data; pattern clustering; compression algorithms; heterogeneous data clustering; metadata; normalized compression distance; sample data set; Automatic control; Clustering algorithms; Compression algorithms; Data mining; Internet; Keyword search; clustering by compression; heterogeneous data; metadata; normalized compression distance;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Roedunet International Conference (RoEduNet), 2010 9th
Conference_Location :
Sibiu
ISSN :
2068-1038
Print_ISBN :
978-1-4244-7335-9
Electronic_ISBN :
2068-1038
Type :
conf
Filename :
5541580
Link To Document :
بازگشت