• DocumentCode
    2856782
  • Title

    High-Dimensional Software Engineering Data and Feature Selection

  • Author

    Wang, Huanjing ; Khoshgoftaar, Taghi M. ; Gao, Kehan ; Seliya, Naeem

  • Author_Institution
    Western Kentucky Univ., Bowling Green, KY, USA
  • fYear
    2009
  • fDate
    2-4 Nov. 2009
  • Firstpage
    83
  • Lastpage
    90
  • Abstract
    Software metrics collected during project development play a critical role in software quality assurance. A software practitioner is very keen on learning which software metrics to focus on for software quality prediction. While a concise set of software metrics is often desired, a typical project collects a very large number of metrics. Minimal attention has been devoted to finding the minimum set of software metrics that have the same predictive capability as a larger set of metrics - we strive to answer that question in this paper. We present a comprehensive comparison between seven commonly-used filter-based feature ranking techniques (FRT) and our proposed hybrid feature selection (HFS) technique. Our case study consists of a very high-dimensional (42 software attributes) software measurement data set obtained from a large telecommunications system. The empirical analysis indicates that HFS performs better than FRT; however, the Kolmogorov-Smirnov feature ranking technique demonstrates competitive performance. For the telecommunications system, it is found that only 10% of the software attributes are sufficient for effective software quality prediction.
  • Keywords
    software metrics; software quality; Kolmogorov-Smirnov feature ranking technique; feature ranking techniques; feature selection; high-dimensional software engineering data; hybrid feature selection; large telecommunications system; project development; software attributes; software metrics; software practitioner; software quality assurance; Artificial intelligence; Data mining; Filters; Machine learning; Power system modeling; Predictive models; Software engineering; Software measurement; Software metrics; Software quality; feature ranking; high-dimensional data; hybrid feature selection; quality prediction; software metrics;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Tools with Artificial Intelligence, 2009. ICTAI '09. 21st International Conference on
  • Conference_Location
    Newark, NJ
  • ISSN
    1082-3409
  • Print_ISBN
    978-1-4244-5619-2
  • Electronic_ISBN
    1082-3409
  • Type

    conf

  • DOI
    10.1109/ICTAI.2009.20
  • Filename
    5365755