DocumentCode
2699387
Title
News Item Extraction for Text Mining inWeb Newspapers
Author
Norvag, K. ; øyri, Randi
Author_Institution
Department of Computer and Information Science, Norwegian University of Science and Technology Trondheim, Norway
fYear
2005
fDate
08-09 April 2005
Firstpage
195
Lastpage
204
Abstract
Web newspapers provide a valuable resource for information. In order to benefit more from the available information, text mining techniques can be applied. However, because each newspaper page often covers a lot of unrelated topics, page-based data mining will not always give useful results. In order to improve on complete-page mining, we present an approach based on extracting the individual news items from the web pages and mining these separately. Automatic news item extraction is a difficult problem, and in this paper we also provide strategies solving that task. We study the quality of the news item extraction, and also provide results from clustering the extracted news items.
fLanguage
English
Publisher
ieee
Conference_Titel
Web Information Retrieval and Integration, 2005. WIRI '05. Proceedings. International Workshop on Challenges in
Print_ISBN
0-7695-2414-1
Type
conf
DOI
10.1109/WIRI.2005.27
Filename
1553014
Link To Document