• DocumentCode
    1834757
  • Title

    Searching for Rules to find Defective Modules in Unbalanced Data Sets

  • Author

    Rodriguez, David ; Riquelme, J.C. ; Ruiz, R. ; Aguilar-Ruiz, J.S.

  • Author_Institution
    Dept. of Comput. Sci., Univ. of Alcala, Alcala de Henares
  • fYear
    2009
  • fDate
    13-15 May 2009
  • Firstpage
    89
  • Lastpage
    92
  • Abstract
    The characterisation of defective modules in software engineering remains a challenge. In this work, we use data mining techniques to search for rules that indicate modules with a high probability of being defective. Using data sets from the PROMISE repository, we first applied feature selection (attribute selection) to work only with those attributes from the data sets capable of predicting defective modules. With the reduced data set, a genetic algorithm is used to search for rules characterising modules with a high probability of being defective. This algorithm overcomes the problem of unbalanced data sets where the number of non-defective samples in the data set highly outnumbers the defective ones.
  • Keywords
    data mining; genetic algorithms; probability; software reliability; PROMISE repository; data mining technique; defective module; feature selection; genetic algorithm; probability; rule searching; software engineering; unbalanced data set; Computer science; Data mining; Degradation; Electronic mail; Genetic algorithms; Pattern recognition; Robustness; Sampling methods; Software engineering; Devective Modules; Genetic Algorithm; Subgroup Discovery;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Search Based Software Engineering, 2009 1st International Symposium on
  • Conference_Location
    Windsor
  • Print_ISBN
    978-0-7695-3675-0
  • Type

    conf

  • DOI
    10.1109/SSBSE.2009.23
  • Filename
    5033185