DocumentCode
339685
Title
The TaxGen framework: automating the generation of a taxonomy for a large document collection
Author
Muller, A. ; Dorre, J. ; Gerstl, P. ; Seiffert, R.
Author_Institution
Dept. of Software Solutions Dev., IBM Germany, Germany
Volume
Track2
fYear
1999
fDate
5-8 Jan. 1999
Abstract
Text mining is an active area of research and development, which combines and expands techniques found in related areas like information retrieval, computational linguistics and data mining to perform an analysis of large corpora of digital documents. This paper describes the TaxGen text mining project carried out at the IBM Software Development Lab. at Boeblingen, Germany. The goal of TaxGen was the automatic generation of a taxonomy for a collection of previously unstructured documents, namely a set of 73,000 news wire documents spanning one year.
Keywords
classification; computational linguistics; data mining; information retrieval; text analysis; very large databases; IBM Software Development Lab., Boeblingen, Germany; TaxGen text mining project; automatic taxonomy generation; computational linguistics; data mining; digital documents; information retrieval; large document collection; news wire documents; text corpus analysis; unstructured documents; Computational linguistics; Data mining; Information analysis; Information retrieval; Performance analysis; Programming; Research and development; Taxonomy; Text mining; Wire;
fLanguage
English
Publisher
ieee
Conference_Titel
Systems Sciences, 1999. HICSS-32. Proceedings of the 32nd Annual Hawaii International Conference on
Conference_Location
Maui, HI, USA
Print_ISBN
0-7695-0001-3
Type
conf
DOI
10.1109/HICSS.1999.772687
Filename
772687
Link To Document