DocumentCode :
2527730
Title :
An Improved Structured and Progressive Electronic Dictionary for the Arabic Language: iSPEDAL
Author :
Hajjar, Mohammad ; El Salam Al Hajjar, Abd ; Zreik, Khaldoun ; Gallinari, Patrick
Author_Institution :
Inst. of Technol., Lebanese Univ., Beirut, Lebanon
fYear :
2010
fDate :
9-15 May 2010
Firstpage :
489
Lastpage :
495
Abstract :
In this article, we propose an improved structured and progressive electronic dictionary for the Arabic language (iSPEDAL) which can be presented in the form of a relational database or in the form of an XML document which can be easily exploitable using suitable query languages. Indeed, many Arabic dictionaries are found but are not structured and not directly exploitable since they are in flat textual files form. iSPEDAL doesn´t contain any duplicated data (roots, prefixes, suffixes, the infixes, the patterns and the derived words). Moreover, for a given word, it provides links to its root, to their associated affixes, and to its patterns. iSPEDAL is supplied automatically from one or several traditional textual dictionaries and is enriched permanently with any Arabic textual corpus using system that we built. This system is composed of a Parser, a Selector, a Classifier, an Extractor, a Comparator, an Analyzer, and a Validator. The Parser allows the transformation of a textual source (dictionary or textual corpus) into a set of words. The Selector determines if a word is new or already exists in iSPEDAL. The Classifier allows to classify a given word and to add it to iSPEDAL as a root or as a derived word. The Extractor uses the Arabic extraction method to deduce the root of all words arriving to this component without their root or any indication about their root. The Comparator permits to avoid duplication of roots, affixes or patterns in iSPEDAL. The Analyzer allows the extraction of the affixes and the pattern from a derived word and of its root. The Validator can validate the information (word, root, patterns, and affixes) before adding to iSPEDAL database. This dictionary can be used to evaluate the information extraction methods from an Arabic document, given that; the vocabulary of the Arabic language is essentially built from the roots.
Keywords :
XML; dictionaries; natural language processing; query languages; relational databases; Arabic extraction method; Arabic language; Arabic textual corpus; XML document; analyzer; classifier; comparator; electronic dictionary; extractor; iSPEDAL dictionary; parser; query languages; relational database; selector; validator; Data mining; Database languages; Dictionaries; Informatics; Laboratories; Pattern analysis; Relational databases; Vocabulary; Web and internet services; XML; Arabic Language; Corpus; Dictionary; Information Extraction; Root;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Internet and Web Applications and Services (ICIW), 2010 Fifth International Conference on
Conference_Location :
Barcelona
Print_ISBN :
978-1-4244-6728-0
Type :
conf
DOI :
10.1109/ICIW.2010.80
Filename :
5476494
Link To Document :
بازگشت