• DocumentCode
    2180419
  • Title

    Concept-based classification for multi-document summarization

  • Author

    Celikyilmaz, Asli ; Hakkani-Tür, Dilek

  • Author_Institution
    Univ. of California, Berkeley, CA, USA
  • fYear
    2011
  • fDate
    22-27 May 2011
  • Firstpage
    5540
  • Lastpage
    5543
  • Abstract
    Documents often contain inherently many concepts reflecting specific and generic aspects. To automatically generate a short summary text of documents on similar topics, it is imperative that we discover general aspects in documents be cause summaries usually contain general rather than specific concepts. This paper presents a semi-supervised extractive summarization model based upon latent concept classification that can differentiate between the two types of aspects as hidden concepts being mentioned in documents. A classifier is trained on hidden concepts discovered from documents and their corresponding human-generated summaries using a probabilistic Bayesian model: the summary-focused topic model. Experimental results based on ROUGE evaluations indicate that ranking sentences to be included in summary text based on the latent summary concept classification has improvements on the quality of the generated summaries.
  • Keywords
    Bayes methods; belief networks; pattern classification; text analysis; ROUGE evaluation; concept based classification; documents text; human generated summary; latent summary concept classification; multi document summarization; probabilistic Bayesian model; semi supervised extractive summarization model; summary focused topic model; Feature extraction; Frequency measurement; Humans; Predictive models; Probabilistic logic; Testing; Training; Automatic document summarization; latent topic classification;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech and Signal Processing (ICASSP), 2011 IEEE International Conference on
  • Conference_Location
    Prague
  • ISSN
    1520-6149
  • Print_ISBN
    978-1-4577-0538-0
  • Electronic_ISBN
    1520-6149
  • Type

    conf

  • DOI
    10.1109/ICASSP.2011.5947614
  • Filename
    5947614