DocumentCode
773368
Title
Classified information: the data clustering problem
Author
Memarsadeghi, Nargess ; O´Leary, D.P.
Author_Institution
Dept. of Comput. Sci., Maryland Univ., MD, USA
Volume
5
Issue
5
fYear
2003
Firstpage
54
Lastpage
60
Abstract
Many projects in engineering and science require data classification based on different heuristics. designers, for example, classify automobile engine performance as acceptable or unacceptable based on a combination of efficiency, emissions, noise levels, and other criteria. Researchers routinely classify documents as "relevant to the current project" or "irrelevant". Genome decoding divides chromosomes into genes, regulatory regions, signals, and so on. Pathologists identify cells as cancerous or benign. We can classify data into different groups by clustering data that are close with respect to some distance measure. In this project, we investigate the design, use, and pitfalls of a popular clustering algorithm, the k-means algorithm.
Keywords
biology computing; cancer; cellular biophysics; genetics; pattern classification; pattern clustering; benign cells; cancerous cells; chromosomes; data classification; data clustering; distance measure; genes; genome decoding; k-means algorithm; pathology; regulatory regions; signals; Automobiles; Automotive engineering; Bioinformatics; Clustering algorithms; Data engineering; Decoding; Design engineering; Engines; Genomics; Noise level;
fLanguage
English
Journal_Title
Computing in Science & Engineering
Publisher
ieee
ISSN
1521-9615
Type
jour
DOI
10.1109/MCISE.2003.1225861
Filename
1225861
Link To Document