DocumentCode
3201854
Title
A framework towards computational discovery of disease sub-types and associated (sub-)biomarkers
Author
Kurnaz, Mehmet Nadir ; Seker, Huseyin
Author_Institution
Dept. of Electr. & Electron. Eng., Nigde Univ., Nigde, Turkey
fYear
2013
fDate
3-7 July 2013
Firstpage
4074
Lastpage
4077
Abstract
Biomarker related patient data is generally assessed in order to determine relevant but generalized subset of the biomarkers. However, it fails to identify specific sub-groups of the patients or their corresponding (subset of) the biomarkers. This paper therefore proposes a novel framework that is capable of discovering disease sub-groups (types) and associated subset of biomarkers, which is expected to lead to enable the discovery of personalized bio-marker set. The framework is based on the utilization of a histogram obtained by using the Euclidean distances between the samples in a given data set. The t-test method is used for the selection of sub-set(s) of the biomarkers whereas the classification is performed by means of k-nearest neighbor, support vector machines and naive Bayes (NBayes) classifiers. For the assessment of the methods, leave-out-out cross validation is employed. As a case study, the method is applied in the analysis of male hypertension microarray data that consists of 159 patients and 22184 gene expressions. The method has helped identify specific sub-groups of the patients and their corresponding bio-marker sub-sets. The results therefore suggest that the generalized bio-marker sub-sets are not representative of the disease and therefore more focus should be on the sub-groups of the patients and their biomarker subsets identified through the proposed approach. It is particularly observed that the threshold values over the histogram are crucial to discover both sub-sets of the samples and biomarkers, and therefore can be used to determine complexity level of the study.
Keywords
Bayes methods; bioinformatics; data analysis; diseases; genetics; genomics; lab-on-a-chip; support vector machines; Euclidean distance; computational discovery; disease biomarker subset selection; disease subgroup; gene expression; histogram utilization; hypertension microarray data analysis; k-nearest neighbor; leave-out-out cross validation; naive Bayes classifier; patient data; personalized biomarker set discovery; support vector machine; Accuracy; Bioinformatics; Biological system modeling; Diseases; Histograms; Hypertension; Support vector machines;
fLanguage
English
Publisher
ieee
Conference_Titel
Engineering in Medicine and Biology Society (EMBC), 2013 35th Annual International Conference of the IEEE
Conference_Location
Osaka
ISSN
1557-170X
Type
conf
DOI
10.1109/EMBC.2013.6610440
Filename
6610440
Link To Document