Title of article
Automatic thesaurus generation for Chinese documents
Author/Authors
Yuen-Hsien Tseng، نويسنده ,
Issue Information
ماهنامه با شماره پیاپی سال 2002
Pages
9
From page
1130
To page
1138
Abstract
This article reports an approach to automatic thesaurus construction for Chinese documents. An effective Chinese keyword extraction algorithm is first presented. Experiments showed that for each document an average of 33% keywords unknown to a lexicon of 123,226 terms could be identified by this algorithm. Of these unregistered words, only 8.3% of them are illegal. Keywords extracted from each document are further filtered for term association analysis. Association weights larger than a threshold are then accumulated over all the documents to yield the final term pair similarities. Compared to previous studies, this method speeds up the thesaurus generation process drastically. It also achieves a similar percentage level of term relatedness.
Journal title
Journal of the American Society for Information Science and Technology
Serial Year
2002
Journal title
Journal of the American Society for Information Science and Technology
Record number
993299
Link To Document