• DocumentCode
    2909875
  • Title

    Sentence Boundary Detection in Colloquial Arabic Text: A Preliminary Result

  • Author

    Al-Subaihin, Afnan A. ; Al-Khalifa, Hend S. ; Al-Salman, AbdulMalik S.

  • Author_Institution
    Coll. of Comput. & Inf. Sci., King Saud Univ., Riyadh, Saudi Arabia
  • fYear
    2011
  • fDate
    15-17 Nov. 2011
  • Firstpage
    30
  • Lastpage
    32
  • Abstract
    Recently, natural language processing tasks are more frequently conducted over online content. This poses a special problem for applications over Arabic language. Online Arabic content is usually written in informal colloquial Arabic, which is characterized to be ill-structured and lacks specific linguistic standardization. In this paper, we investigate a preliminary step to conduct successful NLP processing which is the problem of sentence boundary detection. As informal Arabic lacks basic linguistic rules, we establish a list of commonly used punctuation marks after extensively studying a large amount of informal Arabic text. Moreover, we evaluated the correct usage of these punctuation marks as sentence delimiters; the result yielded a preliminary accuracy of 70%.
  • Keywords
    linguistics; natural language processing; text analysis; text detection; Arabic language; NLP; informal colloquial Arabic text; linguistic rules; natural language processing; online Arabic content; punctuation marks; sentence boundary detection; sentence delimiters; Accuracy; Computational linguistics; Conferences; Facebook; Natural language processing; Pragmatics; Speech; Arabic Language; Natural Language Processing; Sentence Boundary Detection; colloquial Arabic; informal Arabic;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Asian Language Processing (IALP), 2011 International Conference on
  • Conference_Location
    Penang
  • Print_ISBN
    978-1-4577-1733-8
  • Type

    conf

  • DOI
    10.1109/IALP.2011.38
  • Filename
    6121463