DocumentCode :
2922630
Title :
Multi-document arabic text summarisation
Author :
El-Haj, Mahmoud ; Kruschwitz, Udo ; Fox, Chris
Author_Institution :
Sch. of Comput. Sci. & Electron. Eng., Univ. of Essex, Colchester, UK
fYear :
2011
fDate :
13-14 July 2011
Firstpage :
40
Lastpage :
44
Abstract :
In this paper we present our generic extractive Arabic and English multi-document summarisers. We also describe the use of machine translation for evaluating the generated Arabic multi-document summaries using English extractive gold standards. In this work we first address the lack of Arabic multi-document corpora for summarisation and the absence of automatic and manual Arabic gold-standard summaries. These are required to evaluate any automatic Arabic summarisers. Second, we demonstrate the use of Google Translate in creating an Arabic version of the DUC-2002 dataset. The parallel Arabic/English dataset is summarised using the Arabic and English summarisation systems. The automatically generated summaries are evaluated using the ROUGE metric, as well as precision and recall. The results we achieve are compared with the top five systems in the DUC-2002 multi-document summarisation task.
Keywords :
language translation; natural language processing; text analysis; DUC-2002 dataset; English extractive gold standard; Google Translate; ROUGE metric; generic extractive Arabic multidocument summariser; generic extractive English multidocument summariser; machine translation; multidocument Arabic text summarisation system; Conferences; Google; Humans; Information retrieval; Measurement; Redundancy; Standards;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computer Science and Electronic Engineering Conference (CEEC), 2011 3rd
Conference_Location :
Colchester
Print_ISBN :
978-1-4577-1300-2
Type :
conf
DOI :
10.1109/CEEC.2011.5995822
Filename :
5995822
Link To Document :
بازگشت