مرکز منطقه ای اطلاع رساني علوم و فناوري - Notice of Violation of IEEE Publication Principles<BR>Improving heterogeneous data clustering by using metadata and compression algorithms

DocumentCode :

525540

Title :

Notice of Violation of IEEE Publication Principles
Improving heterogeneous data clustering by using metadata and compression algorithms

Author :

Cernian, Alexandra ; Carstoiu, Dorin ; Sgarciu, Valentin

Author_Institution :

Fac. of Autom. Control & Comput. Sci., Politeh. Univ. of Bucharest, Bucharest, Romania

fYear :

2010

fDate :

24-26 June 2010

Firstpage :

169

Lastpage :

173

Abstract :

Notice of Violation of IEEE Publication Principles

"Improving Heterogeneous Data Clustering by Using Metadata and Compression Algorithms"
by Alexandra Cernian, Dorin Carstoiu, Valentin Sgarciu,
in the Proceedings of the 2010 Roedunet International Conference (RoEduNet),June 2010, pp.169-173

After careful and considered review of the content and authorship of this paper by a duly constituted expert committee, this paper has been found to be in violation of IEEE\´s Publication Principles.

This paper contains portions of text from the paper(s) cited below. A credit notice is used, but due to the absence of quotation marks or offset text, copied material is not clearly referenced or specifically identified.

"Etude des Methodes de Classification par Compression"
by Tudor Basarab IONESCU,
published in Rapport interne 2005-06-28-DI-FB
http://wwwdi.supelec.fr/fb/download/Articles/Rapport_2005-06-28-DI-FB.pdf

Nowadays, we have to deal with a large quantity of unstructured, heterogeneous data, produced by an increasing number of sources. Clustering heterogeneous data is essential to getting structured information in response to user queries. In this paper, we assess the results of a new clustering technique - clustering by compression - when applied to metadata associated with heterogeneous sets of data. The clustering by compression procedure is based on a parameter-free, universal, similarity distance, the normalized compression distance or NCD, computed from the lengths of compressed data files (singly and in pair-wise concatenation). Experimental results show that using metadata could improve the average clustering performances with about 20% over clustering the same sample data set without using metadata.

Keywords :

data compression; meta data; pattern clustering; compression algorithms; heterogeneous data clustering; metadata; normalized compression distance; sample data set; Automatic control; Clustering algorithms; Compression algorithms; Data mining; Internet; Keyword search; clustering by compression; heterogeneous data; metadata; normalized compression distance;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Roedunet International Conference (RoEduNet), 2010 9th

Conference_Location :

Sibiu

ISSN :

2068-1038

Print_ISBN :

978-1-4244-7335-9

Electronic_ISBN :

2068-1038

Type :

conf

Filename :

5541580

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=525540