Title :
Data structures for information retrieval
Author :
Nkweteyim, Denis L.
Author_Institution :
Univ. of Buea, Buea, Cameroon
Abstract :
The process of efficiently indexing large document collections for information retrieval places large demands on a computer´s memory and processor, and requires judicious use of these resources. In this paper, we describe our approach to constructing such an index based on the vector-space model (VSM). We review the stages involved in generating an index, for weighting the index terms, and for representing documents in the VSM. We explain our choice of data structures from the parsing of the document collection through the generation of index terms, to generation of document representations. We explain tradeoffs in our choice of data structures. We then demonstrate the approach using the OHSUMED data set. Our results show that even with only a modest amount of main memory (4 GB), large data sets such as the OHSUMED data set can be quickly indexed.
Keywords :
data structures; indexing; information retrieval; OHSUMED data set; VSM; data structure; document representation; indexing; information retrieval; parsing; vector-space model; Computational modeling; Data structures; Dictionaries; Indexes; Random access memory; Vectors; Information retrieval; binary search tree; data structures; dictionary; index; linked list; posting; term frequency; vector-space model;
Conference_Titel :
IST-Africa Conference Proceedings, 2014
Conference_Location :
Le Meridien Ile Maurice
Print_ISBN :
978-1-905824-43-4
DOI :
10.1109/ISTAFRICA.2014.6880643