DocumentCode :
2300542
Title :
Contextual information retrieval based on algorithmic information theory and statistical outlier detection
Author :
Martinez, Rafael ; Cebrián, Manuel ; De Borja Rodríguez, Francisco ; Camacho, David
Author_Institution :
Dept. de Ing. Inf., Univ. Autonoma de Madrid, Madrid
fYear :
2008
fDate :
5-9 May 2008
Firstpage :
292
Lastpage :
297
Abstract :
This work presents an Information Retrieval technique based on algorithmic information theory (using the normalized compression distance), statistical data outlier detection, and a novel database structure. The paper shows how they all can be integrated to retrieve information from generic databases using long text-based queries. Two important problems are addressed. On the one hand, we analyze and tyr to solve the detection of a particular case of false positives: when the distance among two documents is outlyingly low but there is not actual similarity. On the other hand, we propose a way to structure the database such that the similarity distance estimation scales well with the length of the size of the query. All design choices are justified with an experimental evaluation.
Keywords :
information retrieval; information theory; text analysis; algorithmic information theory; contextual information retrieval; generic databases; long text-based queries; statistical data outlier detection; Computer science; Databases; Information retrieval; Information theory; Music information retrieval; Pattern recognition; Search engines; Space technology; Statistics; Text analysis;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Information Theory Workshop, 2008. ITW '08. IEEE
Conference_Location :
Porto
Print_ISBN :
978-1-4244-2269-2
Electronic_ISBN :
978-1-4244-2271-5
Type :
conf
DOI :
10.1109/ITW.2008.4578672
Filename :
4578672
Link To Document :
بازگشت