DocumentCode
2909875
Title
Sentence Boundary Detection in Colloquial Arabic Text: A Preliminary Result
Author
Al-Subaihin, Afnan A. ; Al-Khalifa, Hend S. ; Al-Salman, AbdulMalik S.
Author_Institution
Coll. of Comput. & Inf. Sci., King Saud Univ., Riyadh, Saudi Arabia
fYear
2011
fDate
15-17 Nov. 2011
Firstpage
30
Lastpage
32
Abstract
Recently, natural language processing tasks are more frequently conducted over online content. This poses a special problem for applications over Arabic language. Online Arabic content is usually written in informal colloquial Arabic, which is characterized to be ill-structured and lacks specific linguistic standardization. In this paper, we investigate a preliminary step to conduct successful NLP processing which is the problem of sentence boundary detection. As informal Arabic lacks basic linguistic rules, we establish a list of commonly used punctuation marks after extensively studying a large amount of informal Arabic text. Moreover, we evaluated the correct usage of these punctuation marks as sentence delimiters; the result yielded a preliminary accuracy of 70%.
Keywords
linguistics; natural language processing; text analysis; text detection; Arabic language; NLP; informal colloquial Arabic text; linguistic rules; natural language processing; online Arabic content; punctuation marks; sentence boundary detection; sentence delimiters; Accuracy; Computational linguistics; Conferences; Facebook; Natural language processing; Pragmatics; Speech; Arabic Language; Natural Language Processing; Sentence Boundary Detection; colloquial Arabic; informal Arabic;
fLanguage
English
Publisher
ieee
Conference_Titel
Asian Language Processing (IALP), 2011 International Conference on
Conference_Location
Penang
Print_ISBN
978-1-4577-1733-8
Type
conf
DOI
10.1109/IALP.2011.38
Filename
6121463
Link To Document