• DocumentCode
    73750
  • Title

    Extractive Broadcast News Summarization Leveraging Recurrent Neural Network Language Modeling Techniques

  • Author

    Kuan-Yu Chen ; Shih-Hung Liu ; Chen, Berlin ; Hsin-Min Wang ; Ea-Ee Jan ; Wen-Lian Hsu ; Hsin-Hsi Chen

  • Author_Institution
    Nat. Taiwan Univ., Taipei, Taiwan
  • Volume
    23
  • Issue
    8
  • fYear
    2015
  • fDate
    Aug. 2015
  • Firstpage
    1322
  • Lastpage
    1334
  • Abstract
    Extractive text or speech summarization manages to select a set of salient sentences from an original document and concatenate them to form a summary, enabling users to better browse through and understand the content of the document. A recent stream of research on extractive summarization is to employ the language modeling (LM) approach for important sentence selection, which has proven to be effective for performing speech summarization in an unsupervised fashion. However, one of the major challenges facing the LM approach is how to formulate the sentence models and accurately estimate their parameters for each sentence in the document to be summarized. In view of this, our work in this paper explores a novel use of recurrent neural network language modeling (RNNLM) framework for extractive broadcast news summarization. On top of such a framework, the deduced sentence models are able to render not only word usage cues but also long-span structural information of word co-occurrence relationships within broadcast news documents, getting around the need for the strict bag-of-words assumption. Furthermore, different model complexities and combinations are extensively analyzed and compared. Experimental results demonstrate the performance merits of our summarization methods when compared to several well-studied state-of-the-art unsupervised methods.
  • Keywords
    electronic publishing; information retrieval; natural language processing; recurrent neural nets; speech processing; text analysis; unsupervised learning; LM approach; RNNLM framework; bag-of-words assumption; extractive broadcast news summarization; extractive text summarization; important sentence selection; long-span structural information; recurrent neural network language modeling techniques; sentence models; speech summarization; word co-occurrence relationships; Data models; IEEE transactions; Recurrent neural networks; Speech; Speech processing; Speech recognition; Training; Language modeling; long-span structural information; recurrent neural network; speech summarization;
  • fLanguage
    English
  • Journal_Title
    Audio, Speech, and Language Processing, IEEE/ACM Transactions on
  • Publisher
    ieee
  • ISSN
    2329-9290
  • Type

    jour

  • DOI
    10.1109/TASLP.2015.2432578
  • Filename
    7111264