مرکز منطقه ای اطلاع رساني علوم و فناوري - K-means clustering based compression algorithm for the high-throughput DNA sequence

DocumentCode :

1798869

Title :

K-means clustering based compression algorithm for the high-throughput DNA sequence

Author :

Li Tan ; Jifeng Sun

Author_Institution :

Sch. of Electron. & Inf. Eng., South China Univ. of Technol., Guangzhou, China

fYear :

2014

fDate :

7-9 July 2014

Firstpage :

952

Lastpage :

955

Abstract :

This paper proposes a compression algorithm based on K-means clustering for high-through DNA sequence (DNAC-K). In DNAC-K, we create cluster of sequences based on K-means clustering method at first, then iterate clusters according to the edit distances of subsequences, and finally, adopt Huffman coding to encode the result of clustering result. Experimental results on several sequencing data sets demonstrate better performance of DNAC-K than many of the current high-throughput DNA sequence compression algorithms.

Keywords :

DNA; Huffman codes; biology computing; data compression; encoding; pattern clustering; DNA sequence compression algorithms; DNAC-K; Huffman coding; K-means clustering based compression algorithm; edit distances; high-throughput DNA sequence; sequencing data sets; subsequences; Bioinformatics; Clustering algorithms; Clustering methods; Compression algorithms; DNA; Genomics; Huffman coding; DNA sequence compression; Huffman coding; K-means clustering; sequence alignment;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Audio, Language and Image Processing (ICALIP), 2014 International Conference on

Conference_Location :

Shanghai

Print_ISBN :

978-1-4799-3902-2

Type :

conf

DOI :

10.1109/ICALIP.2014.7009935

Filename :

7009935

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=1798869