DocumentCode :
3400249
Title :
Role of Weighting on TDM in Improvising Performance of LSA on Text Data
Author :
Sudarsun, S ; Prabhu, G ; kumar, V
Author_Institution :
Checktronix India Pvt. Ltd., Chennai
fYear :
2006
fDate :
Sept. 2006
Firstpage :
1
Lastpage :
6
Abstract :
In this paper, we show that the efficiency of LSA is significantly controlled by the choice of weighting algorithm applied. These weighting algorithms allocate relative importance to the document attributes (e.g. keywords) based on their occurrences in the corpus. Effects of different weighting algorithms are the central point of this paper. We experimented with various weighting algorithms to evaluate and study their effects as measured by precision and recall values. Our experiments include weighting function application on TDM (pre-weighting) in order to increase or decrease the relative importance of words based on their occurrence. We also evaluated the application of weighting functions on the projected query (post-weighting). Post-weighted keyword queries were projected on an LSA model built on pre-weighted TDM to obtain closely correlated keywords or a document (keyword collection). We have developed a prototype IR query projection tool which projects keyword queries on the LSA model to retrieve relevant keywords with a floating-point score
Keywords :
information retrieval; singular value decomposition; text analysis; information retrieval query projection tool; keyword query; latent semantic analysis; singular value decomposition; term document matrix; text data; weighting function; Content based retrieval; Data mining; Entropy; Frequency; Information retrieval; Large scale integration; Prototypes; Singular value decomposition; Testing; Time division multiplexing; IDF; IWF; Information Retrieval; LSA; NDV; Precision; Recall; SVD; TDM; Weighting Functions;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
India Conference, 2006 Annual IEEE
Conference_Location :
New Delhi
Print_ISBN :
1-4244-0369-3
Electronic_ISBN :
1-4244-0370-7
Type :
conf
DOI :
10.1109/INDCON.2006.302788
Filename :
4086259
Link To Document :
بازگشت