Title :
Text mining: a new frontier for lossless compression
Author :
Witten, Ian H. ; Bray, Zane ; Mahoui, Malika ; Teahan, Bill
Author_Institution :
Dept. of Comput. Sci., Waikato Univ., Hamilton, New Zealand
Abstract :
Data mining, a burgeoning new technology, is about looking for patterns in data. Likewise, text mining is about looking for patterns in text. Text mining is possible because you do not have to understand text in order to extract useful information from it. Here are four examples. First, if only names could be identified, links could be inserted automatically to other places that mention the same name, links that are “dynamically evaluated” by calling upon a search engine to bind them at click time. Second, actions can be associated with different types of data, using either explicit programming or programming-by-demonstration techniques. A day/time specification appearing anywhere within one´s E-mail could be associated with diary actions such as updating a personal organizer or creating an automatic reminder, and each mention of a day/time in the text could raise a popup menu of calendar-based actions. Third, text could be mined for data in tabular format, allowing databases to be created from formatted tables such as stock-market information on Web pages. Fourth, an agent could monitor incoming newswire stories for company names and collect documents that mention them, an automated press clipping service. This paper aims to promote text compression as a key technology for text mining
Keywords :
data compression; data mining; text analysis; E-mail; automated press clipping service; data mining; data types; databases; diary actions; dynamic evaluation; explicit programming; formatted tables; links; lossless compression; programming-by-demonstration; search engine; text compression; text mining; text patterns; Amorphous materials; Artificial intelligence; Computer science; Computerized monitoring; Data mining; Databases; Information analysis; Natural languages; Search engines; Text mining; Vehicles; Vocabulary; Web pages;
Conference_Titel :
Data Compression Conference, 1999. Proceedings. DCC '99
Conference_Location :
Snowbird, UT
Print_ISBN :
0-7695-0096-X
DOI :
10.1109/DCC.1999.755669