A framework towards computational discovery of disease sub-types and associated (sub-)biomarkers

Author

Kurnaz, Mehmet Nadir ; Seker, Huseyin

Author_Institution

Dept. of Electr. & Electron. Eng., Nigde Univ., Nigde, Turkey

fYear

2013

fDate

3-7 July 2013

Firstpage

4074

Lastpage

4077

Abstract

Biomarker related patient data is generally assessed in order to determine relevant but generalized subset of the biomarkers. However, it fails to identify specific sub-groups of the patients or their corresponding (subset of) the biomarkers. This paper therefore proposes a novel framework that is capable of discovering disease sub-groups (types) and associated subset of biomarkers, which is expected to lead to enable the discovery of personalized bio-marker set. The framework is based on the utilization of a histogram obtained by using the Euclidean distances between the samples in a given data set. The t-test method is used for the selection of sub-set(s) of the biomarkers whereas the classification is performed by means of k-nearest neighbor, support vector machines and naive Bayes (NBayes) classifiers. For the assessment of the methods, leave-out-out cross validation is employed. As a case study, the method is applied in the analysis of male hypertension microarray data that consists of 159 patients and 22184 gene expressions. The method has helped identify specific sub-groups of the patients and their corresponding bio-marker sub-sets. The results therefore suggest that the generalized bio-marker sub-sets are not representative of the disease and therefore more focus should be on the sub-groups of the patients and their biomarker subsets identified through the proposed approach. It is particularly observed that the threshold values over the histogram are crucial to discover both sub-sets of the samples and biomarkers, and therefore can be used to determine complexity level of the study.

Keywords

Bayes methods; bioinformatics; data analysis; diseases; genetics; genomics; lab-on-a-chip; support vector machines; Euclidean distance; computational discovery; disease biomarker subset selection; disease subgroup; gene expression; histogram utilization; hypertension microarray data analysis; k-nearest neighbor; leave-out-out cross validation; naive Bayes classifier; patient data; personalized biomarker set discovery; support vector machine; Accuracy; Bioinformatics; Biological system modeling; Diseases; Histograms; Hypertension; Support vector machines;

fLanguage

English

Publisher

ieee

Conference_Titel

Engineering in Medicine and Biology Society (EMBC), 2013 35th Annual International Conference of the IEEE

Conference_Location

Osaka

ISSN

1557-170X

Type

conf

DOI

10.1109/EMBC.2013.6610440

Filename

6610440