Title :
Using clustering technology to improve XML semantic search
Author_Institution :
Dept. of Electron. & Commun. Eng., North China Electr. Power Univ., Baoding
Abstract :
To get semantic related searching results based on simple keywords, XML search engine not only need to search the matched nodes but also need to check whether those matched nodes are semantic related nodes in XML tree. Since the judgment on the semantic related nodes might cost much time, we first use mining technology to cluster XML documents and compute the similarity between query and XML clusters so as to filter the unrelated clusters with the query. To get exact clusters, we use all paths less than or equal to length L as feature vectors for XML document. We also use bipartite graph to express feature vector matrix and use adjacency list to store the bipartite graph. Based on this idea, we improved the path-based XML clustering algorithm. We use common paths as the feature of cluster and give the similarity measure between query and clusters.
Keywords :
XML; data mining; graph theory; information filters; matrix algebra; pattern clustering; query processing; search engines; tree data structures; XML document clustering; XML semantic search engine; XML tree; adjacency list; bipartite graph; feature vector matrix; mining technology; query processing; Bipartite graph; Clustering algorithms; Costs; Cybernetics; Electronic mail; Filters; Machine learning; Power engineering and energy; Search engines; XML; Adjacency List; Path Feature; Semantic Search; XML Clustering;
Conference_Titel :
Machine Learning and Cybernetics, 2008 International Conference on
Conference_Location :
Kunming
Print_ISBN :
978-1-4244-2095-7
Electronic_ISBN :
978-1-4244-2096-4
DOI :
10.1109/ICMLC.2008.4620853