DocumentCode :
2735273
Title :
Enhancing information retrieval efficiency using semantic-based-combined-similarity-measure
Author :
Saini, Mayank ; Sharma, Dharmendar ; Gupta, P.K.
Author_Institution :
Sch. of Comput. & Syst. Sci., Jawaharlal Nehru Univ., New Delhi, India
fYear :
2011
fDate :
3-5 Nov. 2011
Firstpage :
1
Lastpage :
4
Abstract :
Most of the knowledge intensive organizations are having their information resided in large text document repositories and most of these text repositories and databases are either unstructured or semi-structured. Recently various soft computing techniques have been used to improve information retrieval efficiency. More specifically genetic algorithms have been used for various information retrieval components like matching function learning, documents clustering, information extraction, query optimization [1 - 6]. In most of the cases in information retrieval matching function is based on term frequency. But the problem with this approach is that the syntactic information of the text document is lost and phrases are also not considered, so results in poor accuracy. In this paper we have proposed a new semantic based similarity measure in which each term can be a phrase or a single word and the weight assigned to each term is based on its semantic importance considering each sentence. We have used this semantic similarity measure along with other standard similarity measure as Jaccard and cosine to form the semantic-based-combined-similarity-measure. Standard genetic algorithm has been used to optimize the weight given for each similarity measure.
Keywords :
genetic algorithms; learning (artificial intelligence); neural nets; query processing; text analysis; Jaccard; documents clustering; genetic algorithms; information extraction; information retrieval efficiency enhancement; information retrieval matching function; knowledge intensive organizations; matching function learning; query optimization; semantic-based-combined-similarity-measure; soft computing techniques; term frequency; text databases; text document repositories; Data mining; Databases; Genetic algorithms; Information processing; Information retrieval; Semantics; Weight measurement; Information retrieval; genetic algorithm; semantic similarity measure;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Image Information Processing (ICIIP), 2011 International Conference on
Conference_Location :
Himachal Pradesh
Print_ISBN :
978-1-61284-859-4
Type :
conf
DOI :
10.1109/ICIIP.2011.6108982
Filename :
6108982
Link To Document :
بازگشت