• DocumentCode
    1785965
  • Title

    Data structures for information retrieval

  • Author

    Nkweteyim, Denis L.

  • Author_Institution
    Univ. of Buea, Buea, Cameroon
  • fYear
    2014
  • fDate
    7-9 May 2014
  • Firstpage
    1
  • Lastpage
    8
  • Abstract
    The process of efficiently indexing large document collections for information retrieval places large demands on a computer´s memory and processor, and requires judicious use of these resources. In this paper, we describe our approach to constructing such an index based on the vector-space model (VSM). We review the stages involved in generating an index, for weighting the index terms, and for representing documents in the VSM. We explain our choice of data structures from the parsing of the document collection through the generation of index terms, to generation of document representations. We explain tradeoffs in our choice of data structures. We then demonstrate the approach using the OHSUMED data set. Our results show that even with only a modest amount of main memory (4 GB), large data sets such as the OHSUMED data set can be quickly indexed.
  • Keywords
    data structures; indexing; information retrieval; OHSUMED data set; VSM; data structure; document representation; indexing; information retrieval; parsing; vector-space model; Computational modeling; Data structures; Dictionaries; Indexes; Random access memory; Vectors; Information retrieval; binary search tree; data structures; dictionary; index; linked list; posting; term frequency; vector-space model;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    IST-Africa Conference Proceedings, 2014
  • Conference_Location
    Le Meridien Ile Maurice
  • Print_ISBN
    978-1-905824-43-4
  • Type

    conf

  • DOI
    10.1109/ISTAFRICA.2014.6880643
  • Filename
    6880643