Title :
Classified information: the data clustering problem
Author :
Memarsadeghi, Nargess ; O´Leary, D.P.
Author_Institution :
Dept. of Comput. Sci., Maryland Univ., MD, USA
Abstract :
Many projects in engineering and science require data classification based on different heuristics. designers, for example, classify automobile engine performance as acceptable or unacceptable based on a combination of efficiency, emissions, noise levels, and other criteria. Researchers routinely classify documents as "relevant to the current project" or "irrelevant". Genome decoding divides chromosomes into genes, regulatory regions, signals, and so on. Pathologists identify cells as cancerous or benign. We can classify data into different groups by clustering data that are close with respect to some distance measure. In this project, we investigate the design, use, and pitfalls of a popular clustering algorithm, the k-means algorithm.
Keywords :
biology computing; cancer; cellular biophysics; genetics; pattern classification; pattern clustering; benign cells; cancerous cells; chromosomes; data classification; data clustering; distance measure; genes; genome decoding; k-means algorithm; pathology; regulatory regions; signals; Automobiles; Automotive engineering; Bioinformatics; Clustering algorithms; Data engineering; Decoding; Design engineering; Engines; Genomics; Noise level;
Journal_Title :
Computing in Science & Engineering
DOI :
10.1109/MCISE.2003.1225861