Using semi-supervised clustering to improve regression test selection techniques

Author

Chen, Songyu ; Chen, Zhenyu ; Zhao, Zhihong ; Xu, Baowen ; Feng, Yang

Author_Institution

State Key Lab. for Novel Software Technol., Nanjing Univ., Nanjing, China

fYear

2011

fDate

21-25 March 2011

Firstpage

1

Lastpage

10

Abstract

Cluster test selection is proposed as an efficient regression testing approach. It uses some distance measures and clustering algorithms to group tests into some clusters. Tests in a same cluster are considered to have similar behaviors. A certain sampling strategy for the clustering result is used to build up a small subset of tests, which is expected to approximate the fault detection capability of the original test set. All existing cluster test selection methods employ unsupervised clustering. The previous test results are not used in the process of clustering. It may lead to unsatisfactory clustering results in some cases. In this paper, a semi-supervised clustering method, namely semi-supervised K-means (SSKM), is introduced to improve cluster test selection. SSKM uses limited supervision in the form of pair wise constraints: Must-link and Cannot-link. These pair wise constraints are derived from previous test results to improve clustering results as well as test selection results. The experiment results illustrate the effectiveness of cluster test selection methods with SSKM. Two useful observations are made by analysis. (1) Cluster test selection with SSKM has a better effectiveness when the failed tests are in a medium proportion. (2) A strict definition of pair wise constraint can improve the effectiveness of cluster test selection with SSKM.

Keywords

pattern clustering; regression analysis; statistical testing; cannot-link; cluster test selection; clustering algorithm; fault detection; must-link; pair wise constraint; regression test selection technique; regression testing; semisupervised K-means; semisupervised clustering; unsupervised clustering; Clustering methods; Euclidean distance; Flexible printed circuits; Machine learning; Software testing; K-means; Test selection; pairwise constraint; regression testing; semi-supervised clustering;

fLanguage

English

Publisher

ieee

Conference_Titel

Software Testing, Verification and Validation (ICST), 2011 IEEE Fourth International Conference on

Conference_Location

Berlin

Print_ISBN

978-1-61284-174-8

Electronic_ISBN

978-0-7695-4342-0

Type

conf

DOI

10.1109/ICST.2011.38

Filename

5770589