مرکز منطقه ای اطلاع رساني علوم و فناوري - An Improved Structured and Progressive Electronic Dictionary for the Arabic Language: iSPEDAL

DocumentCode :

2527730

Title :

An Improved Structured and Progressive Electronic Dictionary for the Arabic Language: iSPEDAL

Author :

Hajjar, Mohammad ; El Salam Al Hajjar, Abd ; Zreik, Khaldoun ; Gallinari, Patrick

Author_Institution :

Inst. of Technol., Lebanese Univ., Beirut, Lebanon

fYear :

2010

fDate :

9-15 May 2010

Firstpage :

489

Lastpage :

495

Abstract :

In this article, we propose an improved structured and progressive electronic dictionary for the Arabic language (iSPEDAL) which can be presented in the form of a relational database or in the form of an XML document which can be easily exploitable using suitable query languages. Indeed, many Arabic dictionaries are found but are not structured and not directly exploitable since they are in flat textual files form. iSPEDAL doesn´t contain any duplicated data (roots, prefixes, suffixes, the infixes, the patterns and the derived words). Moreover, for a given word, it provides links to its root, to their associated affixes, and to its patterns. iSPEDAL is supplied automatically from one or several traditional textual dictionaries and is enriched permanently with any Arabic textual corpus using system that we built. This system is composed of a Parser, a Selector, a Classifier, an Extractor, a Comparator, an Analyzer, and a Validator. The Parser allows the transformation of a textual source (dictionary or textual corpus) into a set of words. The Selector determines if a word is new or already exists in iSPEDAL. The Classifier allows to classify a given word and to add it to iSPEDAL as a root or as a derived word. The Extractor uses the Arabic extraction method to deduce the root of all words arriving to this component without their root or any indication about their root. The Comparator permits to avoid duplication of roots, affixes or patterns in iSPEDAL. The Analyzer allows the extraction of the affixes and the pattern from a derived word and of its root. The Validator can validate the information (word, root, patterns, and affixes) before adding to iSPEDAL database. This dictionary can be used to evaluate the information extraction methods from an Arabic document, given that; the vocabulary of the Arabic language is essentially built from the roots.

Keywords :

XML; dictionaries; natural language processing; query languages; relational databases; Arabic extraction method; Arabic language; Arabic textual corpus; XML document; analyzer; classifier; comparator; electronic dictionary; extractor; iSPEDAL dictionary; parser; query languages; relational database; selector; validator; Data mining; Database languages; Dictionaries; Informatics; Laboratories; Pattern analysis; Relational databases; Vocabulary; Web and internet services; XML; Arabic Language; Corpus; Dictionary; Information Extraction; Root;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Internet and Web Applications and Services (ICIW), 2010 Fifth International Conference on

Conference_Location :

Barcelona

Print_ISBN :

978-1-4244-6728-0

Type :

conf

DOI :

10.1109/ICIW.2010.80

Filename :

5476494

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2527730