Title :
Generation of a pseudothesaurus for information retrieval based on cooccurrences and fuzzy set operations
Author :
Miyamoto, Sadaaki ; Miyake, Tetsuo ; Nakayama, Keisuke
Author_Institution :
Inst. of Information Sci. & Electronics, Univ. of Tsukuba, Ibaraki, Japan
Abstract :
A thesaurus in bibliographic information retrieval is a list of technical terms with relations among them, enabling generic retrieval of documents having different but related keywords. Since the construction of a thesaurus is resource consuming an automatic generation method of a thesaurus-like structure is needed. A set-theoretical model of an abstract thesaurus is developed which is related to an automatic generation method based on cooccurrences of terms in the set of texts. Replacement of a basis set in the model and transformation of cooccurrence frequencies into fuzzy sets enables the transition from the abstract mathematical model to an actual procedure of automatic generation. The generated structure is called a pseudothesaurus. An algorithm to generate the pseudothesaurus from a large amount of data is developed. Moreover, two examples based on a dictionary of scientific usage and on an actual bibliographic database are given.
Keywords :
fuzzy set theory; thesauri; bibliographic database; bibliographic information retrieval; cooccurrences; dictionary; fuzzy set; pseudothesaurus; Abstracts; Algorithm design and analysis; Chemicals; Information retrieval; Mathematical model; Thesauri;
Journal_Title :
Systems, Man and Cybernetics, IEEE Transactions on
DOI :
10.1109/TSMC.1983.6313030