DocumentCode :
1399132
Title :
Data Extraction for Deep Web Using WordNet
Author :
Hong, Jer Lang
Author_Institution :
Sch. of Inf. Technol., Monash Univ., Bandar Sunway, Malaysia
Volume :
41
Issue :
6
fYear :
2011
Firstpage :
854
Lastpage :
868
Abstract :
Our survey shows that the techniques used in data extraction from deep webs need to be improved to achieve the efficiency and accuracy of automatic wrappers. Further investigations indicate that the development of a lightweight ontological technique using existing lexical database for English (WordNet) is able to check the similarity of data records and detect the correct data region with higher precision using the semantic properties of these data records. The advantages of this method are that it can extract three types of data records, namely, single-section data records, multiple-section data records, and loosely structured data records, and it also provides options for aligning iterative and disjunctive data items. Experimental results show that our technique is robust and performs better than the existing state-of-the-art wrappers. Tests also show that our wrapper is able to extract data records from multilingual web pages and that it is domain independent.
Keywords :
Internet; information retrieval; natural languages; ontologies (artificial intelligence); English; WordNet; automatic wrappers; data extraction; deep Web; disjunctive data item; iterative data item; lexical database; lightweight ontological technique; loosely structured data record; multilingual Web pages; multiple section data record; semantic properties; similarity check; single section data record; Data mining; HTML; Ontologies; Semantics; Web pages; Automatic wrapper; deep web; ontology;
fLanguage :
English
Journal_Title :
Systems, Man, and Cybernetics, Part C: Applications and Reviews, IEEE Transactions on
Publisher :
ieee
ISSN :
1094-6977
Type :
jour
DOI :
10.1109/TSMCC.2010.2089678
Filename :
5661858
Link To Document :
بازگشت