• DocumentCode
    339685
  • Title

    The TaxGen framework: automating the generation of a taxonomy for a large document collection

  • Author

    Muller, A. ; Dorre, J. ; Gerstl, P. ; Seiffert, R.

  • Author_Institution
    Dept. of Software Solutions Dev., IBM Germany, Germany
  • Volume
    Track2
  • fYear
    1999
  • fDate
    5-8 Jan. 1999
  • Abstract
    Text mining is an active area of research and development, which combines and expands techniques found in related areas like information retrieval, computational linguistics and data mining to perform an analysis of large corpora of digital documents. This paper describes the TaxGen text mining project carried out at the IBM Software Development Lab. at Boeblingen, Germany. The goal of TaxGen was the automatic generation of a taxonomy for a collection of previously unstructured documents, namely a set of 73,000 news wire documents spanning one year.
  • Keywords
    classification; computational linguistics; data mining; information retrieval; text analysis; very large databases; IBM Software Development Lab., Boeblingen, Germany; TaxGen text mining project; automatic taxonomy generation; computational linguistics; data mining; digital documents; information retrieval; large document collection; news wire documents; text corpus analysis; unstructured documents; Computational linguistics; Data mining; Information analysis; Information retrieval; Performance analysis; Programming; Research and development; Taxonomy; Text mining; Wire;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Systems Sciences, 1999. HICSS-32. Proceedings of the 32nd Annual Hawaii International Conference on
  • Conference_Location
    Maui, HI, USA
  • Print_ISBN
    0-7695-0001-3
  • Type

    conf

  • DOI
    10.1109/HICSS.1999.772687
  • Filename
    772687