• DocumentCode
    3055733
  • Title

    Clustering and dimensionality reduction to determine important software quality metrics

  • Author

    Turan, Metin ; Çataltepe, Zehra

  • Author_Institution
    Kultur Univ. & Istanbul Tech. Univ., Istanbul
  • fYear
    2007
  • fDate
    7-9 Nov. 2007
  • Firstpage
    1
  • Lastpage
    6
  • Abstract
    During the last two decades research on software engineering is concentrated on quality. The best approach to quality evaluation goes through determining well-defined metrics on software properties. One such property is module complexity, which is a view of the software that is related to how easily it can be modified. There has been work on constructing a metrics domain which measures the module complexity. Generally, PCA (Principal Component Analysis) is used for defining principal metrics in the domain. Since there are usually no labels for the software data, an unsupervised dimensionality reduction technique, such as PCA needs to be used for determining the most important metrics. In this study, we use the clustering similarity obtained when a certain subset of metrics and when the whole set of metrics are used, to determine the most important metrics. We measure the relative difference/similarity between clusterings using three different indices, namely Rand, Jaccard and Fowlkes-Mallow. We use both backward feature selection and PCA for dimensionality reduction. On the publicly available NASA data, we find out that instead of the whole set of 42 metrics, using only 15 dimensions, we get almost the same clustering performance. Therefore, instead of the whole set of software metrics, a smaller number of them could be used to evaluate the software quality.
  • Keywords
    data handling; pattern clustering; principal component analysis; software metrics; software quality; Fowlkes-Mallow; Jaccard; Rand; module complexity; principal component analysis; principal metrics; quality evaluation; software engineering; software property; software quality metrics; unsupervised dimensionality reduction technique; Gaussian distribution; NASA; Object oriented modeling; Principal component analysis; Software engineering; Software maintenance; Software measurement; Software metrics; Software quality;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer and information sciences, 2007. iscis 2007. 22nd international symposium on
  • Conference_Location
    Ankara
  • Print_ISBN
    978-1-4244-1363-8
  • Electronic_ISBN
    978-1-4244-1364-5
  • Type

    conf

  • DOI
    10.1109/ISCIS.2007.4456865
  • Filename
    4456865