DocumentCode :
3081094
Title :
Formal concept analysis and document clustering via granular computing
Author :
Tsau Young Lin ; I-Jen Chiang
Author_Institution :
San Jose State Univ, San Jose
Volume :
6
fYear :
2006
fDate :
8-11 Oct. 2006
Abstract :
A text/web document is a knowledge representation of a human idea (a structured set of thoughts). This paper refines TFIDF and extended TFIDF(ETFIDF)[16]; These values really measures the co-occurrences of tokens. The ETFID captures the semantic more accurately. Tokens with high TFIDF values are called keywords. The sets of (n+1) Co-occurring keywords with High ETFIDF are called n-granules. The collection of keywords and n-granules can be interpreted geometrically; they form a non-closed simplicial complex. The corresponding non-closed polyhedron is called latent semantic space(LSS). LSS is a geometric knowledge base that provides the semantic to search engine.
Keywords :
knowledge representation; pattern clustering; search engines; text analysis; Web document; document clustering; extended TFIDF; formal concept analysis; geometric knowledge base; granular computing; keywords; knowledge representation; latent semantic space; search engine; text document; token cooccurrences; Computer science; Cybernetics; Extraterrestrial measurements; Humans; Knowledge representation; Search engines; Set theory; Text analysis; Topology; Uncertainty; Keyword; Latent semantic space; granules; simplex;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Systems, Man and Cybernetics, 2006. SMC '06. IEEE International Conference on
Conference_Location :
Taipei
Print_ISBN :
1-4244-0099-6
Type :
conf
DOI :
10.1109/ICSMC.2006.385058
Filename :
4274667
Link To Document :
بازگشت