DocumentCode
3341907
Title
Duplicate Detection for Protein Data Using Simple Close Algorithm
Author
Zainol, Zurinahni ; Taib, Normaslina
Author_Institution
Sch. of Comput. Sci., Univ. Sains Malaysia, Penang
fYear
2006
fDate
38838
Firstpage
1
Lastpage
6
Abstract
Due to the exploratory nature of biological database commercial and publicly, they are invited some drawback such as inaccurate, incomplete, duplicate and outdated of the genome data. Some work must be done to maintain quality data in genome data. Our objective in this paper is to provide a duplicate detection framework which focus on duplicate records in a real world Mice data protein database using simple close algorithm (SCA). SCA is an enhancement of work where they used a priori algorithm to generate the rule. In our approach we used matrix structure to represent data in a database. We implement SCA using Java programming language on 1.4 GHz Pentium 4 PC machine. As a result we show SCA can produces less duplicate rule (non-redundant) than a priori without loss of information and also improves the execution time
Keywords
Java; biology computing; database management systems; mouse controllers (computers); proteins; Java programming language; Mice data protein database; Pentium 4 PC machine; SCA; a priori algorithm; duplicate detection framework; matrix structure; simple close algorithm; Bioinformatics; Biology computing; Computer languages; Databases; Error correction; Genomics; Java; Mice; Proteins; Vocabulary; Duplicate detection, protein data; Simple Close Algorithm (SCA) and duplicate protein data;
fLanguage
English
Publisher
ieee
Conference_Titel
Distributed Frameworks for Multimedia Applications, 2006. The 2nd International Conference on
Conference_Location
Pulau Pinang
Print_ISBN
1-4244-0409-6
Type
conf
DOI
10.1109/DFMA.2006.296908
Filename
4077733
Link To Document