Title :
Mining arabic text using soft-matching association rules
Author :
Al-Zoghby, Aya ; Eldin, Ahmed Sharaf ; Ismail, Nabil A. ; Hamza, Taher
Author_Institution :
Mansoura Univ., Mansoura
Abstract :
Text mining concerns the discovery of knowledge from unstructured textual data. One important task is the discovery of rules that relate specific words and phrases. Textual entries in many database fields exhibit minor variations that may prevent mining algorithms from discovering important patterns. Variations can arise from typographical errors, misspellings, abbreviations, as well as other sources like ambiguity. Ambiguity may be due to the derivation feature, which is very common in the Arabic language. This paper introduces a new system developed to discover soft-matching association rules using a similarity measurements based on the derivation feature of the Arabic language. In addition, it presents the features of using Frequent Closed Item-sets (FCI) concept in mining the association rules rather than Frequent Itemsets (FI).
Keywords :
data mining; natural language processing; string matching; text analysis; Arabic language; Arabic text mining; abbreviations; derivation feature; frequent closed item-sets; knowledge discovery; misspellings; pattern discovery; similarity measurement; soft-matching association rules; typographical error; unstructured textual data; Amorphous materials; Association rules; Data mining; Deductive databases; Explosives; Itemsets; Spatial databases; Text mining; Vehicles; Web pages;
Conference_Titel :
Computer Engineering & Systems, 2007. ICCES '07. International Conference on
Conference_Location :
Cairo
Print_ISBN :
978-1-4244-1365-2
Electronic_ISBN :
978-1-1244-1366-9
DOI :
10.1109/ICCES.2007.4447080