DocumentCode
2601016
Title
Extracting structured data from natural language documents with island parsing
Author
Bacchelli, Alberto ; Cleve, Anthony ; Lanza, Michele ; Mocci, Andrea
Author_Institution
Fac. of Inf., Univ. of Lugano, Lugano, Switzerland
fYear
2011
fDate
6-10 Nov. 2011
Firstpage
476
Lastpage
479
Abstract
The design and evolution of a software system leave traces in various kinds of artifacts. In software, produced by humans for humans, many artifacts are written in natural language by people involved in the project. Such entities contain structured information which constitute a valuable source of knowledge for analyzing and comprehending a system´s design and evolution. However, the ambiguous and informal nature of narrative is a serious challenge in gathering such information, which is scattered throughout natural language text. We present an approach-based on island parsing-to recognize and enable the parsing of structured information that occur in natural language artifacts. We evaluate our approach by applying it to mailing lists pertaining to three software systems. We show that this approach allows us to extract structured data from emails with high precision and recall.
Keywords
data mining; grammars; natural language processing; island parsing; natural language document; structured information; Data mining; Electronic mail; Grammar; History; Java; Natural languages; Production;
fLanguage
English
Publisher
ieee
Conference_Titel
Automated Software Engineering (ASE), 2011 26th IEEE/ACM International Conference on
Conference_Location
Lawrence, KS
ISSN
1938-4300
Print_ISBN
978-1-4577-1638-6
Type
conf
DOI
10.1109/ASE.2011.6100103
Filename
6100103
Link To Document