مرکز منطقه ای اطلاع رساني علوم و فناوري - Categories of Source Code in Industrial Systems

Abstract :

The categorization of source code artifacts affects how the overall product is measured and consequently how these measurements are interpreted. When measuring complexity, for instance, failing to distinguish test and generated code will affect complexity measurements possibly leading to an erroneous interpretation of the overall product complexity. Although categorization problems are known, there seems to be little attention given to this subject in the literature. In this paper, we introduce a categorization for source code artifacts and present an empirical study providing evidence of each category. Artifacts are divided into production and test code, and then these categories are sub-divided into manually-maintained, generated, library, and example code. By analyzing 80 Java and C# industrial systems, we have found evidence of the majority of categories. We show that in average production code only accounts for 60% of a product volume. Also, we have found that for some systems the overall percentage of test and generated code, each can account to over 70% and of library code to over 40%. Finally we discuss the difficulties of distinguishing source code artifacts and conclude with directions for further research.