• DocumentCode
    3779460
  • Title

    TALAA-ASC: A sentence compression corpus for Arabic

  • Author

    Riadh Belkebir;Ahmed Guessoum

  • Author_Institution
    Natural Language Processing and Machine Learning Research Group, Laboratory of Research in Artificial Intelligence, Computer Science Department, Universit? des Sciences et de la Technologie Houari Boumediene (USTHB), Algiers, Algeria
  • fYear
    2015
  • Firstpage
    1
  • Lastpage
    8
  • Abstract
    A lot of work has been performed for many languages other than Arabic in sentence compression. Unfortunately, there is a lack of effort devoted to Arabic sentence compression. One of the reasons behind the lack of work in Arabic sentence compression is the absence of Arabic sentence compression corpora. In order to build and evaluate sentence compression systems, parallel corpora consisting of source sentences and their corresponding compressions are needed. In this paper, we present TALAA-ASC, the first Arabic sentence compression corpus. We present the methodology we followed in order to construct the corpus. We also give the different statistics and analyses that we have performed on this corpus.
  • Keywords
    "XML","Buildings","Guidelines","Natural language processing","Supervised learning","Integer linear programming","Noise measurement"
  • Publisher
    ieee
  • Conference_Titel
    Computer Systems and Applications (AICCSA), 2015 IEEE/ACS 12th International Conference of
  • Electronic_ISBN
    2161-5330
  • Type

    conf

  • DOI
    10.1109/AICCSA.2015.7507228
  • Filename
    7507228