Title :
Similarity measure for semi-structured information retrieval based on the path and neighborhood
Author :
Latreche, Amina ; Guezouli, Larbi
Author_Institution :
Comput. Sci. Dept., Univ. of Batna, Batna, Algeria
Abstract :
With the appearance of semi-structured documents, such as XML documents, information retrieval has been challenging due to the introduction of the structural information known by his complex presentation. The system of information research must organize, store information and then provide the documents which correspond to user information needs. These systems are based on models of information retrieval and use similarity measures taking into account the structural and textual information. This paper presents a new similarity measure, inspired by that of CASIT model (in French: CAlcul de SImilarité Textuelle). It is adapted to semi-structured documents, specifically XML documents. This measure is used to calculate a rate of resemblance between a required XML document and each document of an XML database, by generating of interference wave presenting the existence and importance of the vocabulary of the required document in each document of database. Two important notions are used: the neighborhood that allows the valuation of terms and the path of tags followed to reach lexical units. This similarity measure has been exploited by a system of semi-structured information retrieval which we realized. We have used an experimental XML database and defined the time as criterion of evaluation. Consequently the running time is linear, which makes use of a huge database possible. Then tested in term of quality and answers relevance by the measure: recall / precision.
Keywords :
XML; information needs; information retrieval; text analysis; CASIT model; XML database; XML documents; information research; interference wave; semistructured documents; semistructured information retrieval; similarity measures; structural information; textual information; user information needs; Educational institutions; Indexing; Information retrieval; Interference; Vectors; XML; Similarity measure; XML; indexing; information retrieval; path; semi-structured information;
Conference_Titel :
Information Technology and e-Services (ICITeS), 2012 International Conference on
Conference_Location :
Sousse
Print_ISBN :
978-1-4673-1167-0
DOI :
10.1109/ICITeS.2012.6216597