Title :
Clustering of SNP data based on SCLIQUE
Author :
Jia, Min ; Wu, Yue ; Lei, Zhou ; Liu, Zongtian
Author_Institution :
Comput. Eng. & Sci, Shanghai Univ., Shanghai, China
Abstract :
SNP clustering is an indispensable exploratory tool of biology researchers, which can identify co-expression or co-regulated genes, and predict functions of unknown genes according to the same cluster of genes with known ones. CLIQUE clustering algorithm is an effective way to solve high-dimensional clustering problems, but it is not applicable for categorical data. Single nucleotide polymorphisms (SNPs) are single base pair positions in genomic DNA at which different sequence alternatives (alleles) exist in normal individuals in some population(s). SNPS data is genotype value, which belongs to the categorical data. In this paper, we improve CLIQUE algorithm aimed at SNP clustering from three aspects: re-defining the grids division, re-defining common face between two units, re-defining rules on the generation of high-dimensional candidate dense units. Experiments show that the proposed algorithm SCLIQUE not only takes the advantages of CLIQUE algorithm, but also expands CLIQUE clustering algorithm from numerical space to categorical space.
Keywords :
DNA; biology computing; genetics; genomics; molecular biophysics; pattern clustering; CLIQUE clustering algorithm; SCLIQUE algorithm; SNP data clustering; biology researchers; categorical data; coexpression genes; coregulated genes; genes cluster; genomic DNA; genotype value; grids division; high-dimensional candidate dense units generation; high-dimensional clustering problems; single base pair positions; single nucleotide polymorph; Accuracy; Algorithm design and analysis; Rocks; SCLIQUE algorithm; SNP clustering; categorical data; high dimensional clustering;
Conference_Titel :
Computer Science and Network Technology (ICCSNT), 2011 International Conference on
Conference_Location :
Harbin
Print_ISBN :
978-1-4577-1586-0
DOI :
10.1109/ICCSNT.2011.6182446