Title :
An identification and prediction methods for feature-subsets of CpG islands methylation based on human peripheral blood leukocytes of chromosome 21q
Author :
Ali, Isse ; Mohamoud, Hussein Sheikh ali
Author_Institution :
Centre for Comput. Intell., De Montfort Univ., Leicester, UK
fDate :
Aug. 30 2011-Sept. 3 2011
Abstract :
The pace of technology has allowed classification of feature-subset of methylated and unmethylated of CpG islands of DNA sequence properties. As methylation of CpG islands is involved in various biological phenomena and function of the DNA methylation is correlated to various human diseases such as cancer, analysis of the CpG islands has become important and useful in characterizing and modelling biological phenomena and understanding mechanism of such diseases. However, analysis of the data associated with the CpG islands is a quite new and challenging subject in bioinformatics, systems biology and epigenetics. In this paper, the problems associated with prediction of methylated and unmethylated CpG islands on human chromosome 21q are addressed. In order to carry out the prediction, a data set of 132 samples of the CpG islands from human peripheral blood leukocytes of chromosomes 21q and 4 different feature sub-sets totalling 44 attributes that characterise the methylated and unmethylated groups is extracted for each sample. Due to the nature of this unbalanced data set, in order to avoid disadvantages of traditional leave-one-out (LOO) and m-fold cross validation methods, the LOO method is modified by incorporating the m-fold cross validation approach. In addition, K-nearest neighbour classifier is then adapted for the prediction. The results gained through 440 different comprehensive analyses shows that the methylated CpG islands can be distinguished from the unmethylated CpG islands by a predictive accuracy of between 75% and 80%. More importantly, the modified LOO identifies more clearly and reliably when the feature sub-sets are combined. Another interesting observation is that the modified-LOO-based analysis reveals that the CpGI-specific feature-set achieve the highest predictive accuracy when combined with the other feature sets, which is not the case in the traditional LOO. This also further supports the robustness of the modified-LOO cross validation app- oach as CpGI-specific feature-set is one of the most important and effective attributes shown in other studies.
Keywords :
DNA; biochemistry; biological techniques; biology computing; biomedical measurement; blood; cellular biophysics; medical computing; molecular biophysics; pattern classification; CpG island analysis; CpG islands methylation feature subsets; CpGI specific feature set; DNA methylation; DNA sequence properties; K-nearest neighbour classifier; chromosome 21q methylated CpG islands; chromosome 21q unmethylated CpG islands; data analysis; feature subset classification; human peripheral blood leukocytes; identification method; leave one out cross validation method; m-fold cross validation method; prediction method; Accuracy; Bioinformatics; Biological cells; DNA; Feature extraction; Genomics; Humans; Chromosomes, Human, Pair 21; CpG Islands; DNA Methylation; Humans; Leukocytes;
Conference_Titel :
Engineering in Medicine and Biology Society, EMBC, 2011 Annual International Conference of the IEEE
Conference_Location :
Boston, MA
Print_ISBN :
978-1-4244-4121-1
Electronic_ISBN :
1557-170X
DOI :
10.1109/IEMBS.2011.6090879