• DocumentCode
    2334922
  • Title

    Preparations for semantics-based XML mining

  • Author

    Lee, Jung-Won ; Lee, KiHo ; Kim, Won

  • fYear
    2001
  • fDate
    2001
  • Firstpage
    345
  • Lastpage
    352
  • Abstract
    XML allows users to define elements using arbitrary words and organize them in a nested structure. These features of XML offer both challenges and opportunities in information retrieval, document management, and data mining. In this paper, we propose a new methodology for preparing XML documents for quantitative determination of similarity between XML documents by taking into account XML semantics (i.e., meanings of the elements and nested structures of XML documents). Accurate quantitative determination of similarity between XML documents provides an important basis for a variety of applications of XML document mining and processing. Experiments with XML documents show that our methodology provides a 50-100% improvement in determining similarity over the traditional vector-space model that considers only term-frequency and 100% accuracy in identifying the category of each document from an on-line bookstore
  • Keywords
    data mining; hypermedia markup languages; query processing; XML document preparation; data mining; document management; information retrieval; on-line bookstore; quantitative similarity determination; semantics-based XML mining; Books; Buildings; Data engineering; Data mining; Indexes; Information management; Information retrieval; Power system management; Spatial databases; XML;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Mining, 2001. ICDM 2001, Proceedings IEEE International Conference on
  • Conference_Location
    San Jose, CA
  • Print_ISBN
    0-7695-1119-8
  • Type

    conf

  • DOI
    10.1109/ICDM.2001.989538
  • Filename
    989538