Title :
Analysis of algorithms used to compute term discrimination values
Author :
Pushpalatha, K.P. ; Raju, G.
Author_Institution :
Sch. of Comput. Sci., Mahatma Gandhi Univ., Kottayam, India
Abstract :
Now-a-days all most all areas of life uses internet and search engines for getting relevant and useful information about various topics. Large index data bases are to be used in automatic document search and retrieval from large document collections. Term weighting schemes are very good in identifying and selecting good indexing terms. But it is possible to generate more efficient indexing terms using term discrimination values based on term weighting measures. The sum of similarity coefficients, between pairs of documents for each term, determines the document space density for a collection of documents. The terms, whose inclusion or elimination to/from documents in a collection, makes a large change in the document space density. This change constitutes the difference between the pair of documents, and in turn provides for discrimination measure. An efficient search index can be created using such good discriminating terms so that the precision and recall rates can be improved. This paper presents a study and analysis of a set of algorithms that compute and use term discrimination values (TDV) to identify good discriminators, and in turn to create good search index. It is recognized that there is a crucial relationship between term frequencies and discrimination values. Also discrimination values depend on the type of measure used to determine the similarity coefficients.
Keywords :
Internet; data mining; database indexing; information retrieval; search engines; Internet; automatic document search; document collection; document pair; document retrieval; document space density; indexing term; large index database; search engine; search index; similarity coefficient; term discrimination value; Algorithm design and analysis; Approximation algorithms; Classification algorithms; Clustering algorithms; Complexity theory; Indexes; Vocabulary; TDV; Text mining; discrimination value model; search index;
Conference_Titel :
Computational Intelligence and Computing Research (ICCIC), 2010 IEEE International Conference on
Conference_Location :
Coimbatore
Print_ISBN :
978-1-4244-5965-0
Electronic_ISBN :
978-1-4244-5967-4
DOI :
10.1109/ICCIC.2010.5705844