DocumentCode
534703
Title
Clustering of expressed sequence tags with distance measure based on Burrows-Wheeler transform
Author
Ng, Keng-Hoong ; Phon-Amnuaisuk, Somnuk ; Ho, Chin-Kuan
Author_Institution
Fac. of Inf. Technol., Multimedia Univ., Cyberjaya, Malaysia
Volume
5
fYear
2010
fDate
16-18 Oct. 2010
Firstpage
2183
Lastpage
2187
Abstract
Expressed sequence tag (ESTs) are a technology used for gene discovery and transcriptome analysis. They are single-read short fragments of expressed gene produced from mRNA extracted from a living cell. Clustering is a vital computational step in the processing of ESTs, its main goal is to ensure that all ESTs originated from the same mRNA are grouped together. Basically, the clustering algorithms of EST can be classified into two approaches, i.e. alignment-based and alignment-free. The latter approach is preferred in recent years, due to its faster speed and satisfactory outcome. In this paper, we proposed and implemented an EST clustering algorithm based on the alignment-free approach, where we introduced a measurement of distance between ESTs using the combination of Burrows-Wheeler transform, window length and word-tuple. We assessed the proposed method with a dataset downloaded from the Unigene. The preliminary result shows high clustering quality with this method, where the accuracy of clustering (evaluated using F-measure) can achieve up to 0.9671.
Keywords
biology computing; cellular biophysics; genetics; molecular biophysics; transforms; Burrows-Wheeler transform; clustering; distance measure; expressed sequence tags; gene discovery; mRNA; transcriptome analysis; Accuracy; Bioinformatics; Classification algorithms; Clustering algorithms; Data compression; Genomics; Transforms; alignment-free; burrows-wheeler transform; clustering; distance measure; expressed sequence tag;
fLanguage
English
Publisher
ieee
Conference_Titel
Biomedical Engineering and Informatics (BMEI), 2010 3rd International Conference on
Conference_Location
Yantai
Print_ISBN
978-1-4244-6495-1
Type
conf
DOI
10.1109/BMEI.2010.5639798
Filename
5639798
Link To Document