DocumentCode :
2394606
Title :
Study of Automatic Knowledge Extraction in Specific Chinese Language Domain
Author :
Zhang, Suxiang ; Li, Lei ; Zhong, Yixin
Author_Institution :
Sch. of Inf. Eng., Beijing Univ. of Posts & Telecommun.
fYear :
0
fDate :
0-0 0
Firstpage :
281
Lastpage :
285
Abstract :
The paper presents hierarchy bootstrapping as an alternative approach to learning from a large quantity of unlabeled data in the Chinese language domain. It advocates using a small amount of seed information and a large collection of easily-obtained unlabeled data. Hierarchy bootstrapping initializes a learner with seed information; then it iterates applying the learner to calculate for the unlabeled data. Two case studies of this approach are presented in order to solve the problem of automatic knowledge extraction in information extraction (IE) systems. The first algorithm makes use of seed words and seed patterns to build a learner, which extracts more characteristic words using scalar clusters method. These characteristic words have semantic similarity with seed words. Then more extraction patterns could be learned automatically and added to the knowledge database by using the second algorithm, they are a foundation for analysis of IE. Experimental results are promising
Keywords :
knowledge acquisition; natural languages; text analysis; automatic knowledge extraction; characteristic words; extraction patterns; hierarchy bootstrapping; information extraction systems; knowledge database; scalar clusters method; seed patterns; seed words; specific Chinese language domain; unlabeled data; Algorithm design and analysis; Clustering algorithms; Data mining; Databases; Dictionaries; Knowledge engineering; Natural language processing; Natural languages; Pattern analysis; Text recognition;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Networking, Sensing and Control, 2006. ICNSC '06. Proceedings of the 2006 IEEE International Conference on
Conference_Location :
Ft. Lauderdale, FL
Print_ISBN :
1-4244-0065-1
Type :
conf
DOI :
10.1109/ICNSC.2006.1673158
Filename :
1673158
Link To Document :
بازگشت