Title :
Dragon Toolkit: Incorporating Auto-Learned Semantic Knowledge into Large-Scale Text Retrieval and Mining
Author :
Zhou, Xiaohua ; Zhang, Xiaodan ; Hu, Xiaohua
Author_Institution :
Drexel Univ., Philadelphia
Abstract :
The majority of text retrieval and mining techniques are still based on exact feature (e.g. words) matching and unable to incorporate text semantics. Many researchers believe that the extension with semantic knowledge could improve the results and various methods (most of them are heuristic) have been proposed to account for concept hierarchy, synonymy, and other semantic relationships. However, the results with such semantic extension have been mixed, ranging from slight improvements to decreases in effectiveness, mostly likely due to the lack of a formal framework. Instead, we propose a novel method to address the semantic extension within the framework of language modeling. Our method extracts explicit topic signatures from documents and then statistically maps them into single- word features. The incorporation of semantic knowledge then reduces to the smoothing of unigram language models using semantic knowledge. The dragon toolkit reflects our method and its effectiveness is demonstrated by three tasks, text retrieval, text classification, and text clustering.
Keywords :
classification; data mining; information retrieval; pattern clustering; semantic networks; specification languages; text analysis; auto-learned semantic knowledge; dragon toolkit; feature matching; language modeling; large-scale text retrieval; text classification; text clustering; text mining; text semantics; topic signatures; unigram language models; Artificial intelligence; Data mining; Educational institutions; Information retrieval; Information science; Large-scale systems; Ontologies; Smoothing methods; Text categorization; Unified modeling language;
Conference_Titel :
Tools with Artificial Intelligence, 2007. ICTAI 2007. 19th IEEE International Conference on
Conference_Location :
Patras
Print_ISBN :
978-0-7695-3015-4
DOI :
10.1109/ICTAI.2007.117