DocumentCode :
773368
Title :
Classified information: the data clustering problem
Author :
Memarsadeghi, Nargess ; O´Leary, D.P.
Author_Institution :
Dept. of Comput. Sci., Maryland Univ., MD, USA
Volume :
5
Issue :
5
fYear :
2003
Firstpage :
54
Lastpage :
60
Abstract :
Many projects in engineering and science require data classification based on different heuristics. designers, for example, classify automobile engine performance as acceptable or unacceptable based on a combination of efficiency, emissions, noise levels, and other criteria. Researchers routinely classify documents as "relevant to the current project" or "irrelevant". Genome decoding divides chromosomes into genes, regulatory regions, signals, and so on. Pathologists identify cells as cancerous or benign. We can classify data into different groups by clustering data that are close with respect to some distance measure. In this project, we investigate the design, use, and pitfalls of a popular clustering algorithm, the k-means algorithm.
Keywords :
biology computing; cancer; cellular biophysics; genetics; pattern classification; pattern clustering; benign cells; cancerous cells; chromosomes; data classification; data clustering; distance measure; genes; genome decoding; k-means algorithm; pathology; regulatory regions; signals; Automobiles; Automotive engineering; Bioinformatics; Clustering algorithms; Data engineering; Decoding; Design engineering; Engines; Genomics; Noise level;
fLanguage :
English
Journal_Title :
Computing in Science & Engineering
Publisher :
ieee
ISSN :
1521-9615
Type :
jour
DOI :
10.1109/MCISE.2003.1225861
Filename :
1225861
Link To Document :
بازگشت