DocumentCode :
534703
Title :
Clustering of expressed sequence tags with distance measure based on Burrows-Wheeler transform
Author :
Ng, Keng-Hoong ; Phon-Amnuaisuk, Somnuk ; Ho, Chin-Kuan
Author_Institution :
Fac. of Inf. Technol., Multimedia Univ., Cyberjaya, Malaysia
Volume :
5
fYear :
2010
fDate :
16-18 Oct. 2010
Firstpage :
2183
Lastpage :
2187
Abstract :
Expressed sequence tag (ESTs) are a technology used for gene discovery and transcriptome analysis. They are single-read short fragments of expressed gene produced from mRNA extracted from a living cell. Clustering is a vital computational step in the processing of ESTs, its main goal is to ensure that all ESTs originated from the same mRNA are grouped together. Basically, the clustering algorithms of EST can be classified into two approaches, i.e. alignment-based and alignment-free. The latter approach is preferred in recent years, due to its faster speed and satisfactory outcome. In this paper, we proposed and implemented an EST clustering algorithm based on the alignment-free approach, where we introduced a measurement of distance between ESTs using the combination of Burrows-Wheeler transform, window length and word-tuple. We assessed the proposed method with a dataset downloaded from the Unigene. The preliminary result shows high clustering quality with this method, where the accuracy of clustering (evaluated using F-measure) can achieve up to 0.9671.
Keywords :
biology computing; cellular biophysics; genetics; molecular biophysics; transforms; Burrows-Wheeler transform; clustering; distance measure; expressed sequence tags; gene discovery; mRNA; transcriptome analysis; Accuracy; Bioinformatics; Classification algorithms; Clustering algorithms; Data compression; Genomics; Transforms; alignment-free; burrows-wheeler transform; clustering; distance measure; expressed sequence tag;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Biomedical Engineering and Informatics (BMEI), 2010 3rd International Conference on
Conference_Location :
Yantai
Print_ISBN :
978-1-4244-6495-1
Type :
conf
DOI :
10.1109/BMEI.2010.5639798
Filename :
5639798
Link To Document :
بازگشت