• DocumentCode
    2016303
  • Title

    Word image based latent semantic indexing for conceptual querying in document image databases

  • Author

    Banerjee, Sameek ; Harit, Gaurav ; Chaudhury, Santanu

  • Volume
    2
  • fYear
    2007
  • fDate
    23-26 Sept. 2007
  • Firstpage
    1208
  • Lastpage
    1212
  • Abstract
    In this paper we present an application of latent semantic analysis (LSA) for indexing and retrieval of document images with text. The query is specified as a set of word images and the documents which best match with the query representation in the the latent semantic space are retrieved. We show through extensive experiments on a large database that use of LSA for document images provides improvements in retrieval precision as is the case with electronic text documents.
  • Keywords
    document image processing; image retrieval; indexing; conceptual querying; document image databases; document images indexing; document images retrieval; query representation; word image based latent semantic indexing; word images; Character recognition; Image analysis; Image databases; Image retrieval; Image segmentation; Indexing; Information analysis; Information retrieval; Ontologies; Optical character recognition software;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Document Analysis and Recognition, 2007. ICDAR 2007. Ninth International Conference on
  • Conference_Location
    Parana
  • ISSN
    1520-5363
  • Print_ISBN
    978-0-7695-2822-9
  • Type

    conf

  • DOI
    10.1109/ICDAR.2007.4377107
  • Filename
    4377107