DocumentCode :
2909875
Title :
Sentence Boundary Detection in Colloquial Arabic Text: A Preliminary Result
Author :
Al-Subaihin, Afnan A. ; Al-Khalifa, Hend S. ; Al-Salman, AbdulMalik S.
Author_Institution :
Coll. of Comput. & Inf. Sci., King Saud Univ., Riyadh, Saudi Arabia
fYear :
2011
fDate :
15-17 Nov. 2011
Firstpage :
30
Lastpage :
32
Abstract :
Recently, natural language processing tasks are more frequently conducted over online content. This poses a special problem for applications over Arabic language. Online Arabic content is usually written in informal colloquial Arabic, which is characterized to be ill-structured and lacks specific linguistic standardization. In this paper, we investigate a preliminary step to conduct successful NLP processing which is the problem of sentence boundary detection. As informal Arabic lacks basic linguistic rules, we establish a list of commonly used punctuation marks after extensively studying a large amount of informal Arabic text. Moreover, we evaluated the correct usage of these punctuation marks as sentence delimiters; the result yielded a preliminary accuracy of 70%.
Keywords :
linguistics; natural language processing; text analysis; text detection; Arabic language; NLP; informal colloquial Arabic text; linguistic rules; natural language processing; online Arabic content; punctuation marks; sentence boundary detection; sentence delimiters; Accuracy; Computational linguistics; Conferences; Facebook; Natural language processing; Pragmatics; Speech; Arabic Language; Natural Language Processing; Sentence Boundary Detection; colloquial Arabic; informal Arabic;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Asian Language Processing (IALP), 2011 International Conference on
Conference_Location :
Penang
Print_ISBN :
978-1-4577-1733-8
Type :
conf
DOI :
10.1109/IALP.2011.38
Filename :
6121463
Link To Document :
بازگشت