• DocumentCode
    2922630
  • Title

    Multi-document arabic text summarisation

  • Author

    El-Haj, Mahmoud ; Kruschwitz, Udo ; Fox, Chris

  • Author_Institution
    Sch. of Comput. Sci. & Electron. Eng., Univ. of Essex, Colchester, UK
  • fYear
    2011
  • fDate
    13-14 July 2011
  • Firstpage
    40
  • Lastpage
    44
  • Abstract
    In this paper we present our generic extractive Arabic and English multi-document summarisers. We also describe the use of machine translation for evaluating the generated Arabic multi-document summaries using English extractive gold standards. In this work we first address the lack of Arabic multi-document corpora for summarisation and the absence of automatic and manual Arabic gold-standard summaries. These are required to evaluate any automatic Arabic summarisers. Second, we demonstrate the use of Google Translate in creating an Arabic version of the DUC-2002 dataset. The parallel Arabic/English dataset is summarised using the Arabic and English summarisation systems. The automatically generated summaries are evaluated using the ROUGE metric, as well as precision and recall. The results we achieve are compared with the top five systems in the DUC-2002 multi-document summarisation task.
  • Keywords
    language translation; natural language processing; text analysis; DUC-2002 dataset; English extractive gold standard; Google Translate; ROUGE metric; generic extractive Arabic multidocument summariser; generic extractive English multidocument summariser; machine translation; multidocument Arabic text summarisation system; Conferences; Google; Humans; Information retrieval; Measurement; Redundancy; Standards;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer Science and Electronic Engineering Conference (CEEC), 2011 3rd
  • Conference_Location
    Colchester
  • Print_ISBN
    978-1-4577-1300-2
  • Type

    conf

  • DOI
    10.1109/CEEC.2011.5995822
  • Filename
    5995822