• DocumentCode
    2710968
  • Title

    CpG-discover: A machine learning approach for CpG islands identification from human DNA sequence

  • Author

    Lan, Man ; Xu, Yu ; Li, Lin ; Wang, Fei ; Zuo, Ying ; Chen, Yuan ; Tan, Chew Lim ; Su, Jian

  • Author_Institution
    Dept. of Comput. Sci. & Technol., East China Normal Univ., Shanghai, China
  • fYear
    2009
  • fDate
    14-19 June 2009
  • Firstpage
    1702
  • Lastpage
    1707
  • Abstract
    CpG islands (CGIs) play a fundamental role in genome analysis as genomic markers and tumor markers. Identification of potential CGIs has contributed not only to the prediction of promoters of most house-keeping genes and many tissue-specific genes but also to the understanding of the epigenetic causes of cancer. The most current methods for identifying CGIs suffered from various limitations and involved a lot of human intervention for search purpose. In this paper, we implement a HMM-based CGIs identification system, namely CpG-Discover. Experiments have been conducted on the EMBL human DNA database and in comparison with other widely-used tools. The controlled experimental results indicate that our system is a promising tool and has the capability of locating CGIs accurately. In addition, our system has significant differences from other tools in that it avoids the disadvantages of using sliding windows and it reduces the large amount of human intervention needed to search for or to combine potential CGIs (such as, the thresholds of initial density or distance seed). Therefore, given annotated training data set, our system has the adaptability to find other specific nucleotides sequences in DNA.
  • Keywords
    DNA; biological tissues; biology computing; cancer; genetics; hidden Markov models; learning (artificial intelligence); CpG islands identification; CpG-Discover; EMBL human DNA database; cancer; epigenetic causes; genome analysis; genomic markers; hidden Markov model; house-keeping genes; human DNA sequence; machine learning approach; nucleotides sequences; tissue-specific genes; tumor markers; Bioinformatics; Cancer; Control systems; DNA; Databases; Genomics; Humans; Machine learning; Neoplasms; Sequences;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Neural Networks, 2009. IJCNN 2009. International Joint Conference on
  • Conference_Location
    Atlanta, GA
  • ISSN
    1098-7576
  • Print_ISBN
    978-1-4244-3548-7
  • Electronic_ISBN
    1098-7576
  • Type

    conf

  • DOI
    10.1109/IJCNN.2009.5178863
  • Filename
    5178863