• DocumentCode
    653936
  • Title

    Pasokh: A standard corpus for the evaluation of Persian text summarizers

  • Author

    Behmadi Moghaddas, Behdad ; Kahani, Mohsen ; Toosi, Seyyed Ahmad ; Pourmasoumi, Asef ; Estiri, Ahmad

  • Author_Institution
    Web Technol. Lab., Ferdowsi Univ. of Mashhad, Mashhad, Iran
  • fYear
    2013
  • fDate
    Oct. 31 2013-Nov. 1 2013
  • Firstpage
    471
  • Lastpage
    475
  • Abstract
    The increasingly vast amount of information, particularly on the Web, has resulted in a profound need for automatic summarization systems. The systems, in turn, need to be evaluated in terms of how desirably they can retrieve information. The evaluation is done by comparing the machine summaries against a standard reference corpus containing a reasonably large number of text sources and the summaries that human beings have made out of them. Due to the lack of such a standard corpus for Persian, the summarizers that were developed used to be evaluated against the small corpora constructed by the developers of the proposed systems. This made the systems non-comparable. Thus, Pasokh was constructed as a standard large enough reference corpus. It took over 2000 man-hours of work.
  • Keywords
    information retrieval; natural language processing; text analysis; Pasokh; Persian text summarizer evaluation; automatic summarization systems; information retrieval; machine summaries; reference corpus; text sources; Calendars; Cultural differences; Databases; Economics; Guidelines; Standards; XML; computational processing of Persian; evaluation corpus; evaluation of automatic summarization; multi-document automatic summarization; single-document automatic summarization;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer and Knowledge Engineering (ICCKE), 2013 3th International eConference on
  • Conference_Location
    Mashhad
  • Print_ISBN
    978-1-4799-2092-1
  • Type

    conf

  • DOI
    10.1109/ICCKE.2013.6682873
  • Filename
    6682873