Title :
Classification of XML Documents
Author :
Bouchachia, Abdelhamid ; Hassler, Marcus
Author_Institution :
Dept. of Informatics-Syst., Alpen-Adria-Univ., Klagenfurt
fDate :
March 1 2007-April 5 2007
Abstract :
With the explosion of XML-based online documents, the task of knowledge discovery from the Web becomes highly significant. As an appropriate machinery, classification allows to categorize documents to facilitate that task. A classification approach is introduced in this paper. It is based on the k-nearest neighborhood algorithm that relies on an edit distance measure. The originality of the work lies in combining both the content and the structure of XML documents to compute the edit distance. The approach is empirically evaluated using real-world XML collections
Keywords :
XML; classification; data mining; XML document classification; XML-based online documents; edit distance measure; k-nearest neighborhood algorithm; knowledge discovery; Computational intelligence; Data mining; Explosions; Information retrieval; Machinery; Software libraries; Standards development; Text categorization; Web mining; XML;
Conference_Titel :
Computational Intelligence and Data Mining, 2007. CIDM 2007. IEEE Symposium on
Conference_Location :
Honolulu, HI
Print_ISBN :
1-4244-0705-2
DOI :
10.1109/CIDM.2007.368901