• DocumentCode
    739686
  • Title

    Summarizing a Document by Trimming the Discourse Tree

  • Author

    Hirao, Tsutomu ; Nishino, Masaaki ; Yoshida, Yasuhisa ; Suzuki, Jun ; Yasuda, Norihito ; Nagata, Masaaki

  • Author_Institution
    NTT Commun. Sci. Labs., Nippon Telegraph & Telephone Corp., Kyoto, Japan
  • Volume
    23
  • Issue
    11
  • fYear
    2015
  • Firstpage
    2081
  • Lastpage
    2092
  • Abstract
    Recent studies on extractive text summarization formulate it as a combinatorial optimization problem, extracting the optimal subset from a set of the textual units that maximizes an objective function without violating the length constraint. Although these methods successfully improve automatic evaluation scores, they do not consider the discourse structure in the source document. Thus, summaries generated by these methods may lack logical coherence. In previous work, we proposed a method that exploits a discourse tree structure to produce coherent summaries. By transforming a traditional discourse tree, namely a rhetorical structure theory-based discourse tree (RST-DT), into a dependency-based discourse tree (DEP-DT), we formulated the summarization procedure as a Tree Knapsack Problem whose tree corresponds to the DEP-DT. This paper extends the work with a detailed discussion of the approach together with a novel efficient dynamic programming algorithm for solving the Tree Knapsack Problem. Experiments show that our method not only achieved the highest score in both automatic and human evaluation, but also obtained good performance in terms of the linguistic qualities of the summaries.
  • Keywords
    combinatorial mathematics; dynamic programming; knapsack problems; optimisation; text analysis; tree data structures; DEP-DT; RST-DT; automatic evaluation; automatic evaluation scores; combinatorial optimization problem; dependency-based discourse tree; discourse tree trimming; document summarization; dynamic programming algorithm; extractive text summarization formulation; human evaluation; logical coherence; optimal subset extraction; rhetorical structure theory-based discourse tree; source document; tree knapsack problem; Coherence; Hidden Markov models; IEEE transactions; Optimization; Satellites; Speech; Speech processing; Discourse analysis; single-document summarization; tree knapsack problem;
  • fLanguage
    English
  • Journal_Title
    Audio, Speech, and Language Processing, IEEE/ACM Transactions on
  • Publisher
    ieee
  • ISSN
    2329-9290
  • Type

    jour

  • DOI
    10.1109/TASLP.2015.2465150
  • Filename
    7180340