• DocumentCode
    462085
  • Title

    Biodart - catalogue of biological data aftifact examples

  • Author

    Vceramani, A. ; Gopalakiishnan, K. ; Brusic, V.

  • Author_Institution
    Inst. for Infocomm Res., Singapore
  • fYear
    2006
  • fDate
    11-14 Dec. 2006
  • Abstract
    Information in biological data repositories continues to grow exponentially due to the increasing genomic and proteomic sequencing projects. As with any database, these data repositories are subjected to data quality issues related to correctness, uniformity, completeness, redundancy, among others. Data cleaning is a prerequisite to prevent the interference of low quality data with the accuracy of data mining and analysis. This in turn involves the detection and resolution of data artifacts (errors, discrepancies, redundancies, ambiguifes, and incompleteness). Understanding the causes of data artifacts and systematically classifying them are critical towards their elimination in molecular sequence databases. This paper highlights eight data artifacts found among public molecular databases. Examples of major molecular sequence database records containing these artifacts are collected into the BioDArt catalogue (http://antigen.i2r.a-star.edu.sg/BioDArt).
  • Keywords
    biology computing; data analysis; data integrity; data mining; database management systems; genetics; proteins; scientific information systems; BioDArt catalogue; bioinformation; biological data artifacts; biological data repositories; data analysis; data cleaning; data mining; data quality; genomic sequences; molecular sequence databases; proteomic sequencing project; redundancies;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Biomedical and Pharmaceutical Engineering, 2006. ICBPE 2006. International Conference on
  • Conference_Location
    Singapore
  • Print_ISBN
    978-981-05-79
  • Electronic_ISBN
    978-981-05-79
  • Type

    conf

  • Filename
    4155917