Title :
On the Complexity of Gene Expression Classification Data Sets
Author :
Lorena, Ana C. ; Costa, Ivan G. ; de Souto, Marcilio C. P.
Author_Institution :
Center of Math., Cognition ABC Fed. Univ., Santo Andre
Abstract :
One of the main kinds of computational tasks regarding gene expression data is the construction of classifiers (models), often via some machine learning (ML) technique and given data sets, to automatically discriminate expression patterns from cancer (tumor) and normal tissues or from subtypes of cancers. A very distinctive characteristic of these data sets is its high dimensionality and the fewer number of data items. Such a characteristic makes the induction of accurate ML models difficult (e.g., it could lead to model overfitting). In this context, we present an empirical study on the complexity of the classification task of gene expression data sets, related to cancer, used for classification purposes. In order to do so, we measure the complexity of the ML models used to perform the tumors´ classification. The results indicate that most of these data sets can be effectively discriminated by a simple linear function.
Keywords :
biology computing; cancer; data analysis; genetics; learning (artificial intelligence); pattern classification; tumours; cancer; computational task; expression pattern discrimination; gene expression classification data set; linear function; machine learning; microarray data set analysis; normal tissues; tumor classification; Algorithm design and analysis; Biology computing; Cancer; Data analysis; Gene expression; Machine learning; Mathematics; Neoplasms; Support vector machine classification; Support vector machines; cancer; classifier complexity; gene expression;
Conference_Titel :
Hybrid Intelligent Systems, 2008. HIS '08. Eighth International Conference on
Conference_Location :
Barcelona
Print_ISBN :
978-0-7695-3326-1
Electronic_ISBN :
978-0-7695-3326-1
DOI :
10.1109/HIS.2008.163