• DocumentCode
    59878
  • Title

    A Hierarchical Clustering Method of Selecting Kernel SNP to Unify Informative SNP and Tag SNP

  • Author

    Bo Liao ; Xiong Li ; Lijun Cai ; Zhi Cao ; Haowen Chen

  • Author_Institution
    Coll. of Inf. Sci. & Eng., Hunan Univ., Changsha, China
  • Volume
    12
  • Issue
    1
  • fYear
    2015
  • fDate
    Jan.-Feb. 1 2015
  • Firstpage
    113
  • Lastpage
    122
  • Abstract
    Various strategies can be used to select representative single nucleotide polymorphisms (SNPs) from a large number of SNPs, such as tag SNP for haplotype coverage and informative SNP for haplotype reconstruction, respectively. Representative SNPs are not only instrumental in reducing the cost of genotyping, but also serve an important function in narrowing the combinatorial space in epistasis analysis. The capacity of kernel SNPs to unify informative SNP and tag SNP is explored, and inconsistencies are minimized in further studies. The correlation between multiple SNPs is formalized using multi-information measures. In extending the correlation, a distance formula for measuring the similarity between clusters is first designed to conduct hierarchical clustering. Hierarchical clustering consists of both information gain and haplotype diversity, so that the proposed approach can achieve unification. The kernel SNPs are then selected from every cluster through the top rank or backward elimination scheme. Using these kernel SNPs, extensive experimental comparisons are conducted between informative SNPs on haplotype reconstruction accuracy and tag SNPs on haplotype coverage. Results indicate that the kernel SNP can practically unify informative SNP and tag SNP and is therefore adaptable to various applications.
  • Keywords
    bioinformatics; genetics; genomics; pattern clustering; polymorphism; backward elimination scheme; combinatorial space; epistasis analysis; genotyping; haplotype coverage; haplotype diversity; haplotype reconstruction accuracy; hierarchical clustering method; information gain; multiinformation measures; representative single nucleotide polymorphisms; selecting kernel SNP; tag SNP; unify informative SNP; Accuracy; Bioinformatics; Computational complexity; Correlation; Entropy; IEEE transactions; Kernel; Tag SNP; clustering; informative SNP; support vector machine;
  • fLanguage
    English
  • Journal_Title
    Computational Biology and Bioinformatics, IEEE/ACM Transactions on
  • Publisher
    ieee
  • ISSN
    1545-5963
  • Type

    jour

  • DOI
    10.1109/TCBB.2014.2351797
  • Filename
    6894187