DocumentCode :
3113113
Title :
Information extraction based on information fusion from multiple news sources from the web
Author :
Lv, Yang ; Ng, Wing W Y ; Lee, John W T ; Sun, Binbin ; Yeung, Daniel S.
Author_Institution :
Shenzhen Grad. Sch., Harbin Inst. of Technol., Shenzhen
fYear :
2008
fDate :
12-15 Oct. 2008
Firstpage :
1471
Lastpage :
1476
Abstract :
The traditional information extraction tools have been developed for years. But, the accuracy of extraction is not very satisfactory, especially for named entity extraction. In this work, we analyze the reasons of it and propose a novel method to improve the accuracy. Existing methods extract information from a text which is collected from a single source. This is very difficult to extract the exact information we need. From the Internet, one could easily find tens of sources for the same information (e.g. particular news). In this work, we propose to combine information extracted from multiple sources using a majority voting to find the information we needed. We use Change of CEO as an example and we extract the new CEO, original CEO and the company name for the event. A off-the-shelf named entity extraction tool is adopted and our major contribution is the fusion of extraction results. Without our work, one finds single news provides many people and company names, such that we do not know who the new CEO is. By using our method, we provide the 2 CEO names and 1 company name. Experimental results show that our method has a high accuracy in finding the exact information.
Keywords :
Internet; information needs; information retrieval; sensor fusion; Internet; World Wide Web; information extraction; information fusion; information need; majority voting fusion method; multiple news source; off-the-shelf named entity extraction tool; Data mining; Humans; Internet; Laboratories; Search engines; Sun; Voting; Web pages; Web sites; World Wide Web; Information Extraction; Information Fusion; Multiple Sources;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Systems, Man and Cybernetics, 2008. SMC 2008. IEEE International Conference on
Conference_Location :
Singapore
ISSN :
1062-922X
Print_ISBN :
978-1-4244-2383-5
Electronic_ISBN :
1062-922X
Type :
conf
DOI :
10.1109/ICSMC.2008.4811493
Filename :
4811493
Link To Document :
بازگشت