Classified information: the data clustering problem

Author

Memarsadeghi, Nargess ; O´Leary, D.P.

Author_Institution

Dept. of Comput. Sci., Maryland Univ., MD, USA

Volume

5

Issue

5

fYear

2003

Firstpage

54

Lastpage

60

Abstract

Many projects in engineering and science require data classification based on different heuristics. designers, for example, classify automobile engine performance as acceptable or unacceptable based on a combination of efficiency, emissions, noise levels, and other criteria. Researchers routinely classify documents as "relevant to the current project" or "irrelevant". Genome decoding divides chromosomes into genes, regulatory regions, signals, and so on. Pathologists identify cells as cancerous or benign. We can classify data into different groups by clustering data that are close with respect to some distance measure. In this project, we investigate the design, use, and pitfalls of a popular clustering algorithm, the k-means algorithm.

Keywords

biology computing; cancer; cellular biophysics; genetics; pattern classification; pattern clustering; benign cells; cancerous cells; chromosomes; data classification; data clustering; distance measure; genes; genome decoding; k-means algorithm; pathology; regulatory regions; signals; Automobiles; Automotive engineering; Bioinformatics; Clustering algorithms; Data engineering; Decoding; Design engineering; Engines; Genomics; Noise level;

fLanguage

English

Journal_Title

Computing in Science & Engineering

Publisher

ieee

ISSN

1521-9615

Type

jour

DOI

10.1109/MCISE.2003.1225861

Filename

1225861