• DocumentCode
    3348608
  • Title

    Pre-processing aspects for complexity reduction of the QSAR problem

  • Author

    Dumitriu, L. ; Segal, C. ; Craciun, M.-V. ; Cocu, A.

  • Author_Institution
    Comput. Sci. Dept., Dunarea de Jos Univ., Galati
  • Volume
    2
  • fYear
    2008
  • fDate
    6-8 Sept. 2008
  • Abstract
    Predictive Toxicology (PT) is one of the newest targets of the Knowledge Discovery in Databases (KDD) domain. Its goal is to describe the relationships between the chemical structure of chemical compounds and biological and toxicological processes. In real PT problems there is a very important topic to be considered: the huge number of the chemical descriptors. Irrelevant, redundant, noisy and unreliable data have a negative impact, therefore one of the main goals in KDD is to detect these undesirable proprieties and to eliminate or correct them. This assumes data cleaning, noise reduction and feature selection because the performance of the applied Machine Learning algorithms is strongly related with the quality of the data used. In this paper, we present some of the issues that can be taken into account for preparing data before the actual knowledge discovery is performed.
  • Keywords
    chemistry computing; data mining; learning (artificial intelligence); toxicology; QSAR problem; chemical structure; complexity reduction; data cleaning; feature selection; knowledge discovery; machine learning; noise reduction; predictive toxicology; preprocessing aspects; Chemical compounds; Data mining; Databases; Neural networks; Noise reduction; Pattern analysis; Pattern recognition; Principal component analysis; Statistical analysis; Toxicology; knowledge discovery in databases; prediction; toxicology;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Intelligent Systems, 2008. IS '08. 4th International IEEE Conference
  • Conference_Location
    Varna
  • Print_ISBN
    978-1-4244-1739-1
  • Electronic_ISBN
    978-1-4244-1740-7
  • Type

    conf

  • DOI
    10.1109/IS.2008.4670547
  • Filename
    4670547