• DocumentCode
    77034
  • Title

    Automatic Twitter Topic Summarization With Speech Acts

  • Author

    Renxian Zhang ; Wenjie Li ; Dehong Gao ; You Ouyang

  • Author_Institution
    Innovative Intell. Comput. Center, Hong Kong Polytech. Univ. Shenzhen Res. Inst., Shenzhen, China
  • Volume
    21
  • Issue
    3
  • fYear
    2013
  • fDate
    Mar-13
  • Firstpage
    649
  • Lastpage
    658
  • Abstract
    With the growth of the social media service of Twitter, automatic summarization of Twitter messages (tweets) is in urgent need for efficient processing of the massive tweeted information. Unlike multi-document summarization in general, Twitter topic summarization must handle the numerous, short, dissimilar, and noisy nature of tweets. To address this challenge, we propose a novel speech act-guided summarization approach in this work. Speech acts characterize tweeters´ communicative behavior and provide an organized view of their messages. Speech act recognition is a multi-class classification problem, which we solve by using word-based and symbol-based features that capture both the linguistic features of speech acts and the particularities of Twitter text. The recognized speech acts in tweets are then used to direct the extraction of key words and phrases to fill in templates designed for speech acts. Leveraging high-ranking words and phrases as well as topic information for major speech acts, we propose a round-robin algorithm to generate template-based summaries. Different from the extractive method adopted in most previous works, our summarization method is abstractive. Evaluated on two 100-topic datasets, the summaries generated by our method outperform two kinds of representative extractive summaries and rival human-written summaries in terms of explanatoriness and informativeness.
  • Keywords
    document handling; pattern classification; social networking (online); speech processing; Twitter messages; Twitter text; automatic Twitter topic summarization; communicative behavior; extractive method; high-ranking words; human-written summaries; multiclass classification problem; multidocument summarization; round-robin algorithm; social media service; speech act recognition; speech act-guided summarization approach; symbol-based features; template-based summaries; word-based features; Media; Noise measurement; Pragmatics; Speech; Speech recognition; Twitter; Twitter; abstractive summarization; key word/phrase extraction; speech act;
  • fLanguage
    English
  • Journal_Title
    Audio, Speech, and Language Processing, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1558-7916
  • Type

    jour

  • DOI
    10.1109/TASL.2012.2229984
  • Filename
    6362185