DocumentCode :
1230761
Title :
Incremental Evolution of Fuzzy Grammar Fragments to Enhance Instance Matching and Text Mining
Author :
Martin, Trevor ; Shen, Yun ; Azvine, B.
Author_Institution :
Artificial Intell. Group, Univ. of Bristol, Bristol
Volume :
16
Issue :
6
fYear :
2008
Firstpage :
1425
Lastpage :
1438
Abstract :
In many applications, it is useful to extract structured data from sections of unstructured text. A common approach is to use pattern matching (e.g., regular expressions) or more general grammar-based techniques. In cases where exact templates or grammar fragments are not known, it is possible to use machine learning approaches, based on words or n-grams, to identify the structured data. This is generally a two-stage (train/use) process that cannot easily cope with incremental extensions of the training set. In this paper, we combine a fuzzy grammar-based approach with incremental learning. This enables a set of grammar fragments to evolve incrementally, each time a new example is given, while guaranteeing that it can parse previously seen examples. We propose a novel measure of overlap between fuzzy grammar fragments that can also be used to determine the degree to which a string is parsed by a grammar fragment. This measure of overlap allows us to compare the range of two fuzzy grammar fragments (i.e., to estimate and compare the sets of strings that fuzzily conform to each grammar) without explicitly parsing any strings. A simple application shows the method´s validity.
Keywords :
XML; data structures; fuzzy set theory; grammars; learning (artificial intelligence); string matching; text analysis; fuzzy grammar fragments; grammar-based technique; incremental learning; instance matching; machine learning; pattern matching; string parsing; structured data; text mining; Entity Extraction; Entity extraction; Evolving System; Fuzzy sets; Grammar fragments; Incremental learning; Instance Matching; Tagging; Text Mining; XML; evolving system; extensible markup language (XML); fuzzy sets; grammar fragments; incremental learning; instance matching; tagging; text mining;
fLanguage :
English
Journal_Title :
Fuzzy Systems, IEEE Transactions on
Publisher :
ieee
ISSN :
1063-6706
Type :
jour
DOI :
10.1109/TFUZZ.2008.925920
Filename :
4529087
Link To Document :
بازگشت