DocumentCode :
3597316
Title :
Information Extraction for a scenario from multi-documents with RBFNN and L-GEM
Author :
Lai, Wei-wei ; Ng, Wing W Y ; Yeung, Daniel S. ; Bai, Xin-ru ; Li, Jin-cheng ; Sun, Bin-bin
Author_Institution :
Machine Learning & Cybern. Res. Center, South China Univ. of Technol., Guangzhou, China
Volume :
2
fYear :
2009
Firstpage :
1100
Lastpage :
1105
Abstract :
The goal of information extraction (IE) is to find the specific information from documents composed by natural language for a particular scenario. With the development of IE methodologies, a lot of information extraction tools have been proposed and are playing an important role in information processing. However, the efficiency of these tools may not be satisfactory to users. One of those important reasons is that most of these IE tools extract information from a single document. In this paper, we propose a extracting method which combine current single document based named extraction (NE) tool with a multi-document based radial basis function neural networks (RBFNN) for multi-document IE. The RBFNN is trained by a minimization of the localized generalization error model (L-GEM) to enhance its generalization capability. We collect a set of news pages from the Internet for the same news. Interested names are extracted by the most frequent name extracted by the NE tool. Numbers and other information that can not be extracted by NE tool will be extracted by the RBFNN by a pattern classification approach. The scenario of company layoff is used as an example to show how we extract the corresponding company name, company major location and the number of layoffs. Experimental results show the proposed method is effective and accurate.
Keywords :
Internet; generalisation (artificial intelligence); information retrieval; learning (artificial intelligence); natural languages; pattern classification; radial basis function networks; IE; Internet; L-GEM; NE; RBFNN; company layoff scenario; information extraction; localized generalization error model; multidocument scenario; named extraction tool; natural language; neural network training; pattern classification approach; radial basis function neural network; Computer science; Cybernetics; Data mining; Information processing; Internet; Machine learning; Natural languages; Search engines; Semantic Web; Web pages; DF/IDF; Information Extraction; L-GEM; Multiple Documents; RBFNN;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Machine Learning and Cybernetics, 2009 International Conference on
Print_ISBN :
978-1-4244-3702-3
Electronic_ISBN :
978-1-4244-3703-0
Type :
conf
DOI :
10.1109/ICMLC.2009.5212380
Filename :
5212380
Link To Document :
بازگشت