DocumentCode
498575
Title
A Novel Method of Chinese Web Information Extraction and Applications
Author
Liu, Zhong ; Wang, Ying
Author_Institution
Chengdu Inst. of Comput. Applic., Chinese Acad. of Sci., Chengdu, China
Volume
1
fYear
2009
fDate
10-11 July 2009
Firstpage
65
Lastpage
68
Abstract
One promising application of natural language processing (NLP) research is in the area of information extraction (IE). In this paper, we present work flow of our IE system for the extraction of semantically rich information from the unstructured or semi-structured Chinese web pages. Knowledge engineering approach and automatic training approach are used to extract pattern and built knowledge repository. General IE system needs to label the unlabeled training Web pages. A novel methodology that does not need to label text is developed, including hierarchy filtration pattern matching based on syntax in best distance method and maximum forward boundary recognition using organization suffix repository and part of speech tagging method. As for applications of IE, a new application system based on IE is built. It is object-level vertical search system and object here is Chinese people, so IE is concerned with extracting people´s related attributes from a collection of web pages about Chinese people. The results are displayed as hierarchy directory tree according to people´s attributes. The system makes user find people quickly and easily.
Keywords
Internet; knowledge engineering; natural language processing; Chinese web information extraction; Web pages; automatic training approach; distance method; filtration pattern matching; knowledge engineering approach; maximum forward boundary recognition; natural language processing research; object-level vertical search system; organization suffix repository; speech tagging method; Data mining; Filtration; Knowledge engineering; Natural language processing; Pattern matching; Pattern recognition; Speech recognition; Tagging; Text recognition; Web pages; information extraction (IE); machine learning(ML); natural language processing (NLP);
fLanguage
English
Publisher
ieee
Conference_Titel
Information Engineering, 2009. ICIE '09. WASE International Conference on
Conference_Location
Taiyuan, Shanxi
Print_ISBN
978-0-7695-3679-8
Type
conf
DOI
10.1109/ICIE.2009.43
Filename
5211147
Link To Document