Title :
Software Defect Prediction Based on Conditional Random Field in Imbalance Distribution
Author :
Chunhui Yang;Yan Gao;Jianwen Xiang;Lixin Liang
Author_Institution :
Wuhan Univ. of Technol., Wuhan, China
Abstract :
To contribute software testing, and save testing costs, a wide range of machine learning approachs have been studied to predict defects in software modules. Unfortunately, the imbalanced nature of this type of data increases the learning difficulty of such a task. In this paper, we present UCRF, a method based on undersampling technique and conditional random field (CRF) for software defect prediction in imbalance distribution. In our proposed method, firstly, we leverage meanshift clustering method to reduce the samples of majority class for balancing the train data set. Secondly, we propose to apply CRF model in the above balanced train data set because the CRF model can handle complex features without any change in training procedure. Interestingly, we find that the UCRF method achieves much better final results than the other approach as shown in the software defect data classification task.
Keywords :
"Software","Prediction algorithms","Data models","Algorithm design and analysis","Training","Hidden Markov models","Software algorithms"
Conference_Titel :
Dependable Computing and Internet of Things (DCIT), 2015 2nd International Symposium on
DOI :
10.1109/DCIT.2015.21