Title :
Duplicate Detection for Protein Data Using Simple Close Algorithm
Author :
Zainol, Zurinahni ; Taib, Normaslina
Author_Institution :
Sch. of Comput. Sci., Univ. Sains Malaysia, Penang
Abstract :
Due to the exploratory nature of biological database commercial and publicly, they are invited some drawback such as inaccurate, incomplete, duplicate and outdated of the genome data. Some work must be done to maintain quality data in genome data. Our objective in this paper is to provide a duplicate detection framework which focus on duplicate records in a real world Mice data protein database using simple close algorithm (SCA). SCA is an enhancement of work where they used a priori algorithm to generate the rule. In our approach we used matrix structure to represent data in a database. We implement SCA using Java programming language on 1.4 GHz Pentium 4 PC machine. As a result we show SCA can produces less duplicate rule (non-redundant) than a priori without loss of information and also improves the execution time
Keywords :
Java; biology computing; database management systems; mouse controllers (computers); proteins; Java programming language; Mice data protein database; Pentium 4 PC machine; SCA; a priori algorithm; duplicate detection framework; matrix structure; simple close algorithm; Bioinformatics; Biology computing; Computer languages; Databases; Error correction; Genomics; Java; Mice; Proteins; Vocabulary; Duplicate detection, protein data; Simple Close Algorithm (SCA) and duplicate protein data;
Conference_Titel :
Distributed Frameworks for Multimedia Applications, 2006. The 2nd International Conference on
Conference_Location :
Pulau Pinang
Print_ISBN :
1-4244-0409-6
DOI :
10.1109/DFMA.2006.296908