DocumentCode
3603603
Title
The ManyBugs and IntroClass Benchmarks for Automated Repair of C Programs
Author
Le Goues, Claire ; Holtschulte, Neal ; Smith, Edward K. ; Brun, Yuriy ; Devanbu, Premkumar ; Forrest, Stephanie ; Weimer, Westley
Author_Institution
Sch. of Comput. Sci., Carnegie Mellon Univ., Pittsburgh, PA, USA
Volume
41
Issue
12
fYear
2015
Firstpage
1236
Lastpage
1256
Abstract
The field of automated software repair lacks a set of common benchmark problems. Although benchmark sets are used widely throughout computer science, existing benchmarks are not easily adapted to the problem of automatic defect repair, which has several special requirements. Most important of these is the need for benchmark programs with reproducible, important defects and a deterministic method for assessing if those defects have been repaired. This article details the need for a new set of benchmarks, outlines requirements, and then presents two datasets, ManyBugs and IntroClass, consisting between them of 1,183 defects in 15 C programs. Each dataset is designed to support the comparative evaluation of automatic repair algorithms asking a variety of experimental questions. The datasets have empirically defined guarantees of reproducibility and benchmark quality, and each study object is categorized to facilitate qualitative evaluation and comparisons by category of bug or program. The article presents baseline experimental results on both datasets for three existing repair methods, GenProg, AE, and TrpAutoRepair, to reduce the burden on researchers who adopt these datasets for their own comparative evaluations.
Keywords
C language; benchmark testing; program debugging; software maintenance; software performance evaluation; software quality; C programs; GenProg; IntroClass benchmarks; ManyBugs benchmarks; TrpAutoRepair; automated software repair; automatic defect repair; automatic repair algorithms; benchmark problems; benchmark programs; benchmark quality; benchmark sets; computer science; defects assessment; deterministic method; qualitative evaluation; reproducibility; Benchmark testing; Computer bugs; Electronic mail; Maintenance engineering; Software systems; Automated program repair; INTROCLASS; IntroClass; MANYBUGS; ManyBugs; benchmark; reproducibility; subject defect;
fLanguage
English
Journal_Title
Software Engineering, IEEE Transactions on
Publisher
ieee
ISSN
0098-5589
Type
jour
DOI
10.1109/TSE.2015.2454513
Filename
7153570
Link To Document