Title :
Sociolinguistics and programming
Author :
Fariha Naz;Jacqueline E. Rice
Author_Institution :
Department of Math and Computer Science, University of Lethbridge, Alberta, Canada
Abstract :
This paper focuses on the use of machine learning techniques for the analysis of computer programs in order to acquire information about an author´s gender. There are few existing studies that address the relationship between linguistics and programming; however, in many areas where language is analyzed it is possible to mine important information about the users of that language associated with set of attribute or coding style. In this work we use open source implementations of machine learning algorithms, specifically, nearest neighbor (K*), decision tree (J48), and Bayes classifier (Naïve Bayes). These algorithms were applied to C++ programs which were associated with sociolinguistic information about the program authors. Our goal was to classify the programs according to the gender of the author. As indicated by our initial results we have been able to achieve precision of 72.3%, recall of 72%, and f-measure of 71.9% which demonstrates that we can predict the gender of the authors of C++ programs.
Keywords :
"Pragmatics","Software","Measurement","Computers","Machine learning algorithms","Supervised learning","Software algorithms"
Conference_Titel :
Communications, Computers and Signal Processing (PACRIM), 2015 IEEE Pacific Rim Conference on
Electronic_ISBN :
2154-5952
DOI :
10.1109/PACRIM.2015.7334812