Title :
Clustering of expressed sequence tags with distance measure based on Burrows-Wheeler transform
Author :
Ng, Keng-Hoong ; Phon-Amnuaisuk, Somnuk ; Ho, Chin-Kuan
Author_Institution :
Fac. of Inf. Technol., Multimedia Univ., Cyberjaya, Malaysia
Abstract :
Expressed sequence tag (ESTs) are a technology used for gene discovery and transcriptome analysis. They are single-read short fragments of expressed gene produced from mRNA extracted from a living cell. Clustering is a vital computational step in the processing of ESTs, its main goal is to ensure that all ESTs originated from the same mRNA are grouped together. Basically, the clustering algorithms of EST can be classified into two approaches, i.e. alignment-based and alignment-free. The latter approach is preferred in recent years, due to its faster speed and satisfactory outcome. In this paper, we proposed and implemented an EST clustering algorithm based on the alignment-free approach, where we introduced a measurement of distance between ESTs using the combination of Burrows-Wheeler transform, window length and word-tuple. We assessed the proposed method with a dataset downloaded from the Unigene. The preliminary result shows high clustering quality with this method, where the accuracy of clustering (evaluated using F-measure) can achieve up to 0.9671.
Keywords :
biology computing; cellular biophysics; genetics; molecular biophysics; transforms; Burrows-Wheeler transform; clustering; distance measure; expressed sequence tags; gene discovery; mRNA; transcriptome analysis; Accuracy; Bioinformatics; Classification algorithms; Clustering algorithms; Data compression; Genomics; Transforms; alignment-free; burrows-wheeler transform; clustering; distance measure; expressed sequence tag;
Conference_Titel :
Biomedical Engineering and Informatics (BMEI), 2010 3rd International Conference on
Conference_Location :
Yantai
Print_ISBN :
978-1-4244-6495-1
DOI :
10.1109/BMEI.2010.5639798