DocumentCode :
3400237
Title :
Unsupervised Contextual Keyword Relevance Learning and Measurement using PLSA
Author :
Sudarsun, S. ; Kalaivendhan, Dalou ; Venkateswarlu, M.
Author_Institution :
Checktronix India Pvt. Ltd., Chennai
fYear :
2006
fDate :
15-17 Sept. 2006
Firstpage :
1
Lastpage :
6
Abstract :
In this paper, we have developed a probabilistic approach using PLSA for the discovery and analysis of contextual keyword relevance based on the distribution of keywords across a training text corpus. We have shown experimentally, the flexibility of this approach in classifying keywords into different domains based on their context. We have developed a prototype system that allows us to project keyword queries on the loaded PLSA model and returns keywords that are closely correlated. The keyword query is vectorized using the PLSA model in the reduce aspect space and correlation is derived by calculating a dot product. We also discuss the parameters that control PLSA performance including a) number of aspects, b) number of EM iterations c) weighting functions on TDM (pre-weighting). We have estimated the quality through computation of precision-recall scores. We have presented our experiments on PLSA application towards document classification
Keywords :
classification; data mining; information retrieval; probability; singular value decomposition; text analysis; unsupervised learning; contextual keyword relevance discovery; document classification; probabilistic latent semantic analysis; singular value decomposition; term document matrix; text corpus; unsupervised learning; Application software; Association rules; Computer networks; Context modeling; Data mining; Helium; Information analysis; Information retrieval; Machine assisted indexing; Vectors; Aspect model; Keyword Relevance; PLSA; Polysemy; SVD; Synonymy; Unsupervised Clustering;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
India Conference, 2006 Annual IEEE
Conference_Location :
New Delhi
Print_ISBN :
1-4244-0369-3
Electronic_ISBN :
1-4244-0370-7
Type :
conf
DOI :
10.1109/INDCON.2006.302787
Filename :
4086258
Link To Document :
بازگشت