• DocumentCode
    3341907
  • Title

    Duplicate Detection for Protein Data Using Simple Close Algorithm

  • Author

    Zainol, Zurinahni ; Taib, Normaslina

  • Author_Institution
    Sch. of Comput. Sci., Univ. Sains Malaysia, Penang
  • fYear
    2006
  • fDate
    38838
  • Firstpage
    1
  • Lastpage
    6
  • Abstract
    Due to the exploratory nature of biological database commercial and publicly, they are invited some drawback such as inaccurate, incomplete, duplicate and outdated of the genome data. Some work must be done to maintain quality data in genome data. Our objective in this paper is to provide a duplicate detection framework which focus on duplicate records in a real world Mice data protein database using simple close algorithm (SCA). SCA is an enhancement of work where they used a priori algorithm to generate the rule. In our approach we used matrix structure to represent data in a database. We implement SCA using Java programming language on 1.4 GHz Pentium 4 PC machine. As a result we show SCA can produces less duplicate rule (non-redundant) than a priori without loss of information and also improves the execution time
  • Keywords
    Java; biology computing; database management systems; mouse controllers (computers); proteins; Java programming language; Mice data protein database; Pentium 4 PC machine; SCA; a priori algorithm; duplicate detection framework; matrix structure; simple close algorithm; Bioinformatics; Biology computing; Computer languages; Databases; Error correction; Genomics; Java; Mice; Proteins; Vocabulary; Duplicate detection, protein data; Simple Close Algorithm (SCA) and duplicate protein data;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Distributed Frameworks for Multimedia Applications, 2006. The 2nd International Conference on
  • Conference_Location
    Pulau Pinang
  • Print_ISBN
    1-4244-0409-6
  • Type

    conf

  • DOI
    10.1109/DFMA.2006.296908
  • Filename
    4077733