• DocumentCode
    773368
  • Title

    Classified information: the data clustering problem

  • Author

    Memarsadeghi, Nargess ; O´Leary, D.P.

  • Author_Institution
    Dept. of Comput. Sci., Maryland Univ., MD, USA
  • Volume
    5
  • Issue
    5
  • fYear
    2003
  • Firstpage
    54
  • Lastpage
    60
  • Abstract
    Many projects in engineering and science require data classification based on different heuristics. designers, for example, classify automobile engine performance as acceptable or unacceptable based on a combination of efficiency, emissions, noise levels, and other criteria. Researchers routinely classify documents as "relevant to the current project" or "irrelevant". Genome decoding divides chromosomes into genes, regulatory regions, signals, and so on. Pathologists identify cells as cancerous or benign. We can classify data into different groups by clustering data that are close with respect to some distance measure. In this project, we investigate the design, use, and pitfalls of a popular clustering algorithm, the k-means algorithm.
  • Keywords
    biology computing; cancer; cellular biophysics; genetics; pattern classification; pattern clustering; benign cells; cancerous cells; chromosomes; data classification; data clustering; distance measure; genes; genome decoding; k-means algorithm; pathology; regulatory regions; signals; Automobiles; Automotive engineering; Bioinformatics; Clustering algorithms; Data engineering; Decoding; Design engineering; Engines; Genomics; Noise level;
  • fLanguage
    English
  • Journal_Title
    Computing in Science & Engineering
  • Publisher
    ieee
  • ISSN
    1521-9615
  • Type

    jour

  • DOI
    10.1109/MCISE.2003.1225861
  • Filename
    1225861