DocumentCode
265308
Title
Extracting N-gram terms collocation from tagged Arabic corpus
Author
Alromima, Waseem ; Moawad, Ibrahim F. ; Elgohary, Rania ; Aref, Mostafa
Author_Institution
Fac. of Comput. & Inf. Sci., Ain Shams Univ., Cairo, Egypt
fYear
2014
fDate
15-17 Dec. 2014
Abstract
Information Extraction (IE) is one of the most important Natural Language Processing (NLP) applications, which extracts information such as Named-Entities (NE) and collocation of terms from the corpus. Collocation is a sequence of terms that co-occur together in the corpus. In Arabic Information Extraction, there are many problems because of the complex of Arabic´s grammar and ambiguity. In general, in linguistics research, the more efficient corpus is the one annotated by Part of Speech Tagging (POST). In this paper, we propose a prototype that extracts collocation of N-gram words (from 2-6 gram) based on the sequence of POST from Arabic Quran corpus. This approach extracts the collocation of N-gram words by matching the input structured pattern of Arabic language versus the Part of Speech Tagging of Quran corpus. The system enables users to select a sequence of tags (2-6 gram) and scope of the corpus source (whole Quran Corpus or specific Surah). To show how the system is beneficial for linguistic research, a set of experiments has been conducted in different scenarios.
Keywords
computational linguistics; grammars; natural language processing; Arabic Quran corpus; Arabic grammar; Arabic information extraction; Arabic language; N-gram terms collocation; N-gram words; NLP application; POST; corpus source; linguistics research; named-entity; natural language processing application; part of speech tagging; specific Surah; tagged Arabic corpus; Data mining; Educational institutions; Natural language processing; Pattern matching; Pragmatics; Speech; Tagging; Arabic Phrases; Computational linguistics; Information Extraction; Part-of-Speech Tagging (POST); n-gram;
fLanguage
English
Publisher
ieee
Conference_Titel
Informatics and Systems (INFOS), 2014 9th International Conference on
Conference_Location
Cairo
Print_ISBN
978-977-403-689-7
Type
conf
DOI
10.1109/INFOS.2014.7036700
Filename
7036700
Link To Document