Title :
A Lightweight Approach to Uncover Technical Artifacts in Unstructured Data
Author :
Bettenburg, Nicolas ; Adams, Bram ; Hassan, Ahmed E. ; Smidt, Michel
Author_Institution :
Software Anal. & Intell. Lab., Queen´´s Univ., Kingston, ON, Canada
Abstract :
Developer communication through email, chat, or issue report comments consists mostly of largely unstructured data, i.e., natural language text, mixed with technical artifacts such as project-specific jargon, abbreviations, source code patches, stack traces and identifiers. These technical artifacts represent a valuable source of knowledge on the technical part of the system, with a wide range of applications from establishing traceability links to creating project-specific vocabularies. However, the lack of well-defined boundaries between natural language and technical content make the automated mining of technical artifacts challenging. As a first step towards a general-purpose technique to extracting technical artifacts from unstructured data, we present a lightweight approach to untangle technical artifacts and natural language text. Our approach is based on existing spell checking tools, which are well-understood, fast, readily available across platforms and impartial to different kinds of textual data. Through a handcrafted benchmark, we demonstrate that our approach is able to successfully uncover a wide range of technical artifacts in unstructured data.
Keywords :
data mining; data structures; natural language processing; program diagnostics; software engineering; text analysis; vocabulary; abbreviations; identifiers; natural language text; project-specific jargon; project-specific vocabularies; source code patches; spell checking tools; stack traces; technical artifacts mining; traceability links; unstructured data; Benchmark testing; Conferences; Data mining; Electronic mail; IEEE Computer Society; Natural languages; Software; language analysis; technical artifacts; text mining; unstructured data;
Conference_Titel :
Program Comprehension (ICPC), 2011 IEEE 19th International Conference on
Conference_Location :
Kingston, ON
Print_ISBN :
978-1-61284-308-7
Electronic_ISBN :
1092-8138
DOI :
10.1109/ICPC.2011.36