• DocumentCode
    2772636
  • Title

    Multi-document Summarization by Information Distance

  • Author

    Long, Chong ; Huang, Minlie ; Zhu, Xiaoyan ; Li, Ming

  • Author_Institution
    Dept. of Comput. Sci. & Technol., Tsinghua Univ., Beijing, China
  • fYear
    2009
  • fDate
    6-9 Dec. 2009
  • Firstpage
    866
  • Lastpage
    871
  • Abstract
    Fast changing knowledge on the Internet can be acquired more efficiently with the help of automatic document summarization and updating techniques. This paper described a novel approach for multi-document update summarization. The best summary is defined to be the one which has the minimum information distance to the entire document set. The best update summary has the minimum conditional information distance to a document cluster given that a prior document cluster has already been read. Experiments on the DUC 2007 dataset and the TAC 2008 dataset have proved that our method closely correlates with the human summaries and outperforms other programs such as LexRank in many categories under the ROUGE evaluation criterion.
  • Keywords
    data mining; text analysis; Internet; ROUGE evaluation criterion; conditional information distance; minimum information distance; multidocument update summarization; Australia; Computer science; Data mining; Government; Humans; Information science; Information theory; Intelligent systems; Internet; Text mining; Data Mining; Information Distance; Kolmogorov Complexity; Text Mining;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Mining, 2009. ICDM '09. Ninth IEEE International Conference on
  • Conference_Location
    Miami, FL
  • ISSN
    1550-4786
  • Print_ISBN
    978-1-4244-5242-2
  • Electronic_ISBN
    1550-4786
  • Type

    conf

  • DOI
    10.1109/ICDM.2009.107
  • Filename
    5360325