Duplicate Detection for Protein Data Using Simple Close Algorithm

Author

Zainol, Zurinahni ; Taib, Normaslina

Author_Institution

Sch. of Comput. Sci., Univ. Sains Malaysia, Penang

fYear

2006

fDate

38838

Firstpage

1

Lastpage

6

Abstract

Due to the exploratory nature of biological database commercial and publicly, they are invited some drawback such as inaccurate, incomplete, duplicate and outdated of the genome data. Some work must be done to maintain quality data in genome data. Our objective in this paper is to provide a duplicate detection framework which focus on duplicate records in a real world Mice data protein database using simple close algorithm (SCA). SCA is an enhancement of work where they used a priori algorithm to generate the rule. In our approach we used matrix structure to represent data in a database. We implement SCA using Java programming language on 1.4 GHz Pentium 4 PC machine. As a result we show SCA can produces less duplicate rule (non-redundant) than a priori without loss of information and also improves the execution time

Keywords

Java; biology computing; database management systems; mouse controllers (computers); proteins; Java programming language; Mice data protein database; Pentium 4 PC machine; SCA; a priori algorithm; duplicate detection framework; matrix structure; simple close algorithm; Bioinformatics; Biology computing; Computer languages; Databases; Error correction; Genomics; Java; Mice; Proteins; Vocabulary; Duplicate detection, protein data; Simple Close Algorithm (SCA) and duplicate protein data;

fLanguage

English

Publisher

ieee

Conference_Titel

Distributed Frameworks for Multimedia Applications, 2006. The 2nd International Conference on

Conference_Location

Pulau Pinang

Print_ISBN

1-4244-0409-6

Type

conf

DOI

10.1109/DFMA.2006.296908

Filename

4077733