DocumentCode :
3091014
Title :
Analyzing software quality with limited fault-proneness defect data
Author :
Seliya, Naeem ; Khoshgoftaar, Taghi M. ; Zhong, Shi
Author_Institution :
Comput. & Inf. Sci., Michigan Univ., Dearborn, MI, USA
fYear :
2005
fDate :
12-14 Oct. 2005
Firstpage :
89
Lastpage :
98
Abstract :
Assuring whether the desired software quality and reliability is met for a project is as important as delivering it within scheduled budget and time. This is especially vital for high-assurance software systems where software failures can have severe consequences. To achieve the desired software quality, practitioners utilize software quality models to identify high-risk program modules: e.g., software quality classification models are built using training data consisting of software measurements and fault-proneness data from previous development experiences similar to the project currently under-development. However, various practical issues can limit availability of fault-proneness data for all modules in the training data, leading to the data consisting of many modules with no fault-proneness data, i.e., unlabeled data. To address this problem, we propose a novel semi-supervised clustering scheme for software quality analysis with limited fault-proneness data. It is a constraint-based semi-supervised clustering scheme based on the k-means algorithm. The proposed approach is investigated with software measurement data of two NASA software projects, JM1 and KC2. Empirical results validate the promise of our semi-supervised clustering technique for software quality modeling and analysis in the presence of limited defect data. Additionally, the approach provides some valuable insight into the characteristics of certain program modules that remain unlabeled subsequent to our semi-supervised clustering analysis.
Keywords :
software fault tolerance; software metrics; software quality; fault-proneness defect data; high-assurance software system; high-risk program module; k-means algorithm; semisupervised clustering scheme; software failure; software measurement; software quality classification model; software reliability; Information science; Predictive models; Processor scheduling; Software engineering; Software measurement; Software quality; Software systems; Software tools; Training data; USA Councils; k-means; semi-supervised clustering; software faults; software measurements; software quality;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
High-Assurance Systems Engineering, 2005. HASE 2005. Ninth IEEE International Symposium on
Conference_Location :
Heidelberg, Germany
ISSN :
1530-2059
Print_ISBN :
0-7695-2377-3
Type :
conf
DOI :
10.1109/HASE.2005.4
Filename :
1581286
Link To Document :
بازگشت