• DocumentCode
    534703
  • Title

    Clustering of expressed sequence tags with distance measure based on Burrows-Wheeler transform

  • Author

    Ng, Keng-Hoong ; Phon-Amnuaisuk, Somnuk ; Ho, Chin-Kuan

  • Author_Institution
    Fac. of Inf. Technol., Multimedia Univ., Cyberjaya, Malaysia
  • Volume
    5
  • fYear
    2010
  • fDate
    16-18 Oct. 2010
  • Firstpage
    2183
  • Lastpage
    2187
  • Abstract
    Expressed sequence tag (ESTs) are a technology used for gene discovery and transcriptome analysis. They are single-read short fragments of expressed gene produced from mRNA extracted from a living cell. Clustering is a vital computational step in the processing of ESTs, its main goal is to ensure that all ESTs originated from the same mRNA are grouped together. Basically, the clustering algorithms of EST can be classified into two approaches, i.e. alignment-based and alignment-free. The latter approach is preferred in recent years, due to its faster speed and satisfactory outcome. In this paper, we proposed and implemented an EST clustering algorithm based on the alignment-free approach, where we introduced a measurement of distance between ESTs using the combination of Burrows-Wheeler transform, window length and word-tuple. We assessed the proposed method with a dataset downloaded from the Unigene. The preliminary result shows high clustering quality with this method, where the accuracy of clustering (evaluated using F-measure) can achieve up to 0.9671.
  • Keywords
    biology computing; cellular biophysics; genetics; molecular biophysics; transforms; Burrows-Wheeler transform; clustering; distance measure; expressed sequence tags; gene discovery; mRNA; transcriptome analysis; Accuracy; Bioinformatics; Classification algorithms; Clustering algorithms; Data compression; Genomics; Transforms; alignment-free; burrows-wheeler transform; clustering; distance measure; expressed sequence tag;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Biomedical Engineering and Informatics (BMEI), 2010 3rd International Conference on
  • Conference_Location
    Yantai
  • Print_ISBN
    978-1-4244-6495-1
  • Type

    conf

  • DOI
    10.1109/BMEI.2010.5639798
  • Filename
    5639798