• DocumentCode
    844543
  • Title

    On the Classification of a Small Imbalanced Cytogenetic Image Database

  • Author

    Lerner, Boaz ; Yeshaya, Josepha ; Koushnir, Lev

  • Author_Institution
    Dept. of Electr. & Comput. Eng., Ben-Gurion Univ. of the Negev, Beer-Sheva
  • Volume
    4
  • Issue
    2
  • fYear
    2007
  • Firstpage
    204
  • Lastpage
    215
  • Abstract
    Solving a multiclass classification task using a small imbalanced database of patterns of high dimension is difficult due to the curse-of-dimensionality and the bias of the training toward the majority classes. Such a problem has arisen while diagnosing genetic abnormalities by classifying a small database of fluorescence in situ hybridization signals of types having different frequencies of occurrence. We propose and experimentally study using the cytogenetic domain two solutions to the problem. The first is hierarchical decomposition of the classification task, where each hierarchy level is designed to tackle a simpler problem which is represented by classes that are approximately balanced. The second solution is balancing the data by up-sampling the minority classes accompanied by dimensionality reduction. Implemented by the naive Bayesian classifier or the multilayer perceptron neural network, both solutions have diminished the problem and contributed to accuracy improvement. In addition, the experiments suggest that coping with the smallness of the data is more beneficial than dealing with its imbalance
  • Keywords
    Bayes methods; biomedical optical imaging; cellular biophysics; fluorescence; genetics; image classification; medical image processing; multilayer perceptrons; Bayesian classifier; dimensionality reduction; fluorescence in situ hybridization signals; genetic abnormalities diagnosis; hierarchical decomposition; minority class up-sampling; multiclass classification task; multilayer perceptron neural network; small imbalanced cytogenetic image database; Bayesian methods; DNA; Fluorescence; Genetics; Image databases; Image sequence analysis; Marine animals; Multilayer perceptrons; Niobium compounds; Signal processing; Classification; dimensionality reduction; genetic diagnosis; imbalanced data; multilayer perceptron (MLP); naive Bayesian classifier (NBC); small sample size.; Algorithms; Artificial Intelligence; Chromosome Mapping; Cluster Analysis; Cytogenetic Analysis; Databases, Factual; Humans; Image Interpretation, Computer-Assisted; In Situ Hybridization, Fluorescence; Information Storage and Retrieval; Microscopy, Fluorescence; Pattern Recognition, Automated; Sequence Analysis, DNA;
  • fLanguage
    English
  • Journal_Title
    Computational Biology and Bioinformatics, IEEE/ACM Transactions on
  • Publisher
    ieee
  • ISSN
    1545-5963
  • Type

    jour

  • DOI
    10.1109/TCBB.2007.070207
  • Filename
    4196532