DocumentCode :
3694235
Title :
Is this code written in English? A study of the natural language of comments and identifiers in practice
Author :
Timo Pawelka;Elmar Juergens
Author_Institution :
Technische Universitä
fYear :
2015
Firstpage :
401
Lastpage :
410
Abstract :
Comments and identifiers are the main source of documentation of source-code and are therefore an integral part of the development and the maintenance of a program. As English is the world language, most comments and identifiers are written in English. However, if they are in any other language, a developer without knowledge of this language will almost perceive the code to be undocumented or even obfuscated. In absence of industrial data, academia is not aware of the extent of the problem of non-English comments and identifiers in practice. In this paper, we propose an approach for the language identification of source-code comments and identifiers. With the approach, a large-scale study has been conducted of the natural language of source-code comments and identifiers, analyzing multiple open-source and industry systems. The results show that a significant amount of the industry projects contain comments and identifiers in more than one language, whereas none of the analyzed open-source systems has this problem.
Keywords :
"Industries","Open source software","Natural languages","Radiation detectors","Maintenance engineering","Documentation","Programming"
Publisher :
ieee
Conference_Titel :
Software Maintenance and Evolution (ICSME), 2015 IEEE International Conference on
Type :
conf
DOI :
10.1109/ICSM.2015.7332491
Filename :
7332491
Link To Document :
بازگشت