Data structures for information retrieval

Author

Nkweteyim, Denis L.

Author_Institution

Univ. of Buea, Buea, Cameroon

fYear

2014

fDate

7-9 May 2014

Firstpage

1

Lastpage

8

Abstract

The process of efficiently indexing large document collections for information retrieval places large demands on a computer´s memory and processor, and requires judicious use of these resources. In this paper, we describe our approach to constructing such an index based on the vector-space model (VSM). We review the stages involved in generating an index, for weighting the index terms, and for representing documents in the VSM. We explain our choice of data structures from the parsing of the document collection through the generation of index terms, to generation of document representations. We explain tradeoffs in our choice of data structures. We then demonstrate the approach using the OHSUMED data set. Our results show that even with only a modest amount of main memory (4 GB), large data sets such as the OHSUMED data set can be quickly indexed.

Keywords

data structures; indexing; information retrieval; OHSUMED data set; VSM; data structure; document representation; indexing; information retrieval; parsing; vector-space model; Computational modeling; Data structures; Dictionaries; Indexes; Random access memory; Vectors; Information retrieval; binary search tree; data structures; dictionary; index; linked list; posting; term frequency; vector-space model;

fLanguage

English

Publisher

ieee

Conference_Titel

IST-Africa Conference Proceedings, 2014

Conference_Location

Le Meridien Ile Maurice

Print_ISBN

978-1-905824-43-4

Type

conf

DOI

10.1109/ISTAFRICA.2014.6880643

Filename

6880643