Title :
Uncertainty Reduction for Knowledge Discovery and Information Extraction on the World Wide Web
Author :
Ji, Heng ; Deng, Hongbo ; Han, Jiawei
Author_Institution :
Department of Computer Science, City University of New York, New York City, NY, USA
Abstract :
In this paper, we give an overview of knowledge discovery (KD) and information extraction (IE) techniques on the World Wide Web (WWW). We intend to answer the following questions: What kind of additional uncertainty challenges are introduced by the WWW setting to basic KD and IE techniques? What are the fundamental techniques that can be used to reduce such uncertainty and achieve reasonable KD and IE performance on the WWW? What is the impact of each novel method? What types of interactions can be conducted between these techniques and information networks to make them benefit from each other? In what way can we utilize the results in more interesting applications? What are the remaining challenges and what are the possible ways to address these challenges? We hope this can provide a road map to advance KD and IE on the WWW to a higher level of performance, portability and utilization.
Keywords :
Analytical models; Hidden Markov models; Natural language processing; Text mining; Text processing; Uncertainty; World Wide Web; natural language processing; text analysis; text mining;
Journal_Title :
Proceedings of the IEEE
DOI :
10.1109/JPROC.2012.2190489