• DocumentCode
    2699387
  • Title

    News Item Extraction for Text Mining inWeb Newspapers

  • Author

    Norvag, K. ; øyri, Randi

  • Author_Institution
    Department of Computer and Information Science, Norwegian University of Science and Technology Trondheim, Norway
  • fYear
    2005
  • fDate
    08-09 April 2005
  • Firstpage
    195
  • Lastpage
    204
  • Abstract
    Web newspapers provide a valuable resource for information. In order to benefit more from the available information, text mining techniques can be applied. However, because each newspaper page often covers a lot of unrelated topics, page-based data mining will not always give useful results. In order to improve on complete-page mining, we present an approach based on extracting the individual news items from the web pages and mining these separately. Automatic news item extraction is a difficult problem, and in this paper we also provide strategies solving that task. We study the quality of the news item extraction, and also provide results from clustering the extracted news items.
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Web Information Retrieval and Integration, 2005. WIRI '05. Proceedings. International Workshop on Challenges in
  • Print_ISBN
    0-7695-2414-1
  • Type

    conf

  • DOI
    10.1109/WIRI.2005.27
  • Filename
    1553014