DocumentCode
3055733
Title
Clustering and dimensionality reduction to determine important software quality metrics
Author
Turan, Metin ; Çataltepe, Zehra
Author_Institution
Kultur Univ. & Istanbul Tech. Univ., Istanbul
fYear
2007
fDate
7-9 Nov. 2007
Firstpage
1
Lastpage
6
Abstract
During the last two decades research on software engineering is concentrated on quality. The best approach to quality evaluation goes through determining well-defined metrics on software properties. One such property is module complexity, which is a view of the software that is related to how easily it can be modified. There has been work on constructing a metrics domain which measures the module complexity. Generally, PCA (Principal Component Analysis) is used for defining principal metrics in the domain. Since there are usually no labels for the software data, an unsupervised dimensionality reduction technique, such as PCA needs to be used for determining the most important metrics. In this study, we use the clustering similarity obtained when a certain subset of metrics and when the whole set of metrics are used, to determine the most important metrics. We measure the relative difference/similarity between clusterings using three different indices, namely Rand, Jaccard and Fowlkes-Mallow. We use both backward feature selection and PCA for dimensionality reduction. On the publicly available NASA data, we find out that instead of the whole set of 42 metrics, using only 15 dimensions, we get almost the same clustering performance. Therefore, instead of the whole set of software metrics, a smaller number of them could be used to evaluate the software quality.
Keywords
data handling; pattern clustering; principal component analysis; software metrics; software quality; Fowlkes-Mallow; Jaccard; Rand; module complexity; principal component analysis; principal metrics; quality evaluation; software engineering; software property; software quality metrics; unsupervised dimensionality reduction technique; Gaussian distribution; NASA; Object oriented modeling; Principal component analysis; Software engineering; Software maintenance; Software measurement; Software metrics; Software quality;
fLanguage
English
Publisher
ieee
Conference_Titel
Computer and information sciences, 2007. iscis 2007. 22nd international symposium on
Conference_Location
Ankara
Print_ISBN
978-1-4244-1363-8
Electronic_ISBN
978-1-4244-1364-5
Type
conf
DOI
10.1109/ISCIS.2007.4456865
Filename
4456865
Link To Document