Title :
Summarizing a Document by Trimming the Discourse Tree
Author :
Hirao, Tsutomu ; Nishino, Masaaki ; Yoshida, Yasuhisa ; Suzuki, Jun ; Yasuda, Norihito ; Nagata, Masaaki
Author_Institution :
NTT Commun. Sci. Labs., Nippon Telegraph & Telephone Corp., Kyoto, Japan
Abstract :
Recent studies on extractive text summarization formulate it as a combinatorial optimization problem, extracting the optimal subset from a set of the textual units that maximizes an objective function without violating the length constraint. Although these methods successfully improve automatic evaluation scores, they do not consider the discourse structure in the source document. Thus, summaries generated by these methods may lack logical coherence. In previous work, we proposed a method that exploits a discourse tree structure to produce coherent summaries. By transforming a traditional discourse tree, namely a rhetorical structure theory-based discourse tree (RST-DT), into a dependency-based discourse tree (DEP-DT), we formulated the summarization procedure as a Tree Knapsack Problem whose tree corresponds to the DEP-DT. This paper extends the work with a detailed discussion of the approach together with a novel efficient dynamic programming algorithm for solving the Tree Knapsack Problem. Experiments show that our method not only achieved the highest score in both automatic and human evaluation, but also obtained good performance in terms of the linguistic qualities of the summaries.
Keywords :
combinatorial mathematics; dynamic programming; knapsack problems; optimisation; text analysis; tree data structures; DEP-DT; RST-DT; automatic evaluation; automatic evaluation scores; combinatorial optimization problem; dependency-based discourse tree; discourse tree trimming; document summarization; dynamic programming algorithm; extractive text summarization formulation; human evaluation; logical coherence; optimal subset extraction; rhetorical structure theory-based discourse tree; source document; tree knapsack problem; Coherence; Hidden Markov models; IEEE transactions; Optimization; Satellites; Speech; Speech processing; Discourse analysis; single-document summarization; tree knapsack problem;
Journal_Title :
Audio, Speech, and Language Processing, IEEE/ACM Transactions on
DOI :
10.1109/TASLP.2015.2465150