Title :
Efficient unsupervised extraction of words categories using symmetric patterns and high frequency words
Author :
Rong, Liu ; Zhiping, Zhang ; Ning, Pang
Author_Institution :
Foreign Language Coll., Taiyuan Univ. of Technol., Taiyuan, China
Abstract :
This paper presents a novel approach for discovering and extracting sets of words sharing semantic meaning. We utilize meta-patterns of high frequency words and content words in order to discover pattern candidates. Symmetric patterns are then identified using graph-based measures, and word categories are created based on graph clique sets. Our method is the pattern-based method that requires no seed patterns or words provided manually. For Chinese, only POS is carried out in advance. The computation time for large corpora is linear. The result is preferable by manual judgment.
Keywords :
graph theory; natural language processing; Chinese; POS; graph based measures; high frequency words; unsupervised words categories extraction; Semantics; sharing semantic meaning; symmetric patterns; unsupervised;
Conference_Titel :
Artificial Intelligence and Education (ICAIE), 2010 International Conference on
Conference_Location :
Hangzhou
Print_ISBN :
978-1-4244-6935-2
DOI :
10.1109/ICAIE.2010.5641103