• DocumentCode
    2190381
  • Title

    Hierarchical theme and topic model for summarization

  • Author

    Jen-Tzung Chien ; Ying-Lan Chang

  • Author_Institution
    Dept. of Electr. & Comput. Eng., Nat. Chiao Tung Univ., Hsinchu, Taiwan
  • fYear
    2013
  • fDate
    22-25 Sept. 2013
  • Firstpage
    1
  • Lastpage
    6
  • Abstract
    This paper presents a hierarchical summarization model to extract representative sentences from a set of documents. In this study, we select the thematic sentences and identify the topical words based on a hierarchical theme and topic model (H2TM). The latent themes and topics are inferred from document collection. A tree stick-breaking process is proposed to draw the theme proportions for representation of sentences. The structural learning is performed without fixing the number of themes and topics. This H2TM is delicate and flexible to represent words and sentences from heterogeneous documents. Thematic sentences are effectively extracted for document summarization. In the experiments, the proposed H2TM outperforms the other methods in terms of precision, recall and F-measure.
  • Keywords
    document handling; learning (artificial intelligence); tree data structures; H2TM; document summarization; heterogeneous documents; hierarchical summarization model; hierarchical theme; representative sentences extraction; structural learning; thematic sentences; topic model; tree stick-breaking process; Bayes methods; Cities and towns; Computational modeling; Conferences; Data collection; Data models; Bayesian nonparametrics; Topic model; document summarization; structural learning;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Machine Learning for Signal Processing (MLSP), 2013 IEEE International Workshop on
  • Conference_Location
    Southampton
  • ISSN
    1551-2541
  • Type

    conf

  • DOI
    10.1109/MLSP.2013.6661943
  • Filename
    6661943