DocumentCode :
3201854
Title :
A framework towards computational discovery of disease sub-types and associated (sub-)biomarkers
Author :
Kurnaz, Mehmet Nadir ; Seker, Huseyin
Author_Institution :
Dept. of Electr. & Electron. Eng., Nigde Univ., Nigde, Turkey
fYear :
2013
fDate :
3-7 July 2013
Firstpage :
4074
Lastpage :
4077
Abstract :
Biomarker related patient data is generally assessed in order to determine relevant but generalized subset of the biomarkers. However, it fails to identify specific sub-groups of the patients or their corresponding (subset of) the biomarkers. This paper therefore proposes a novel framework that is capable of discovering disease sub-groups (types) and associated subset of biomarkers, which is expected to lead to enable the discovery of personalized bio-marker set. The framework is based on the utilization of a histogram obtained by using the Euclidean distances between the samples in a given data set. The t-test method is used for the selection of sub-set(s) of the biomarkers whereas the classification is performed by means of k-nearest neighbor, support vector machines and naive Bayes (NBayes) classifiers. For the assessment of the methods, leave-out-out cross validation is employed. As a case study, the method is applied in the analysis of male hypertension microarray data that consists of 159 patients and 22184 gene expressions. The method has helped identify specific sub-groups of the patients and their corresponding bio-marker sub-sets. The results therefore suggest that the generalized bio-marker sub-sets are not representative of the disease and therefore more focus should be on the sub-groups of the patients and their biomarker subsets identified through the proposed approach. It is particularly observed that the threshold values over the histogram are crucial to discover both sub-sets of the samples and biomarkers, and therefore can be used to determine complexity level of the study.
Keywords :
Bayes methods; bioinformatics; data analysis; diseases; genetics; genomics; lab-on-a-chip; support vector machines; Euclidean distance; computational discovery; disease biomarker subset selection; disease subgroup; gene expression; histogram utilization; hypertension microarray data analysis; k-nearest neighbor; leave-out-out cross validation; naive Bayes classifier; patient data; personalized biomarker set discovery; support vector machine; Accuracy; Bioinformatics; Biological system modeling; Diseases; Histograms; Hypertension; Support vector machines;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Engineering in Medicine and Biology Society (EMBC), 2013 35th Annual International Conference of the IEEE
Conference_Location :
Osaka
ISSN :
1557-170X
Type :
conf
DOI :
10.1109/EMBC.2013.6610440
Filename :
6610440
Link To Document :
بازگشت