Title :
News Keyword Extraction for Topic Tracking
Author :
Lee, Sungjick ; Kim, Han-Joon
Author_Institution :
Dept. of Electr. & Comput. Eng., Seoul Univ., Seoul
Abstract :
This paper presents a keyword extraction technique that can be used for tracking topics over time. In our work, keywords are a set of significant words in an article that gives high-level description of its contents to readers. Identifying keywords from a large amount of on-line news data is very useful in that it can produce a short summary of news articles. As on-line text documents rapidly increase in size with the growth of WWW, keyword extraction has become a basis of several text mining applications such as search engine, text categorization, summarization, and topic detection. Manual keyword extraction is an extremely difficult and time consuming task; in fact, it is almost impossible to extract keywords manually in case of news articles published in a single day due to their volume. For a rapid use of keywords, we need to establish an automated process that extracts keywords from news articles. We propose an unsupervised keyword extraction technique that includes several variants of the conventional TF-IDF model with reasonable heuristics.
Keywords :
Internet; data mining; information resources; information retrieval; search engines; text analysis; unsupervised learning; WWW; news keyword extraction; online text document; search engine; text categorization; text mining; text summarization; topic detection; topic tracking; unsupervised technique; Computer networks; Data mining; Filtering; Frequency; Information management; Portals; Text categorization; Text mining; Web and internet services; World Wide Web; Information Retrieval; TF-IDF; Text Mining; keyword Extraction;
Conference_Titel :
Networked Computing and Advanced Information Management, 2008. NCM '08. Fourth International Conference on
Conference_Location :
Gyeongju
Print_ISBN :
978-0-7695-3322-3
DOI :
10.1109/NCM.2008.199