• DocumentCode
    1801167
  • Title

    Robust retrieval of noisy text

  • Author

    Lopresti, Daniel P.

  • Author_Institution
    Matsushita Information Technol. Lab., Panasonic Technols. Inc., Princeton, NJ, USA
  • fYear
    1996
  • fDate
    13-15, May 1996
  • Firstpage
    76
  • Lastpage
    85
  • Abstract
    We examine the effects of simulated OCR errors on Boolean query models for information retrieval. We show that even relatively small amounts of such noise can have a significant impact. To address this issue, we formulate new variants of the traditional models by combining two classic paradigms for dealing with imprecise data: approximate string matching and fuzzy logic. Using a recall/precision analysis of an experiment involving nearly 60 million query evaluations, we demonstrate that the new fuzzy retrieval methods are generally more robust than their “sharp” counterparts
  • Keywords
    Boolean functions; fuzzy logic; information retrieval; optical character recognition; query processing; string matching; Boolean query models; approximate string matching; fuzzy logic; fuzzy retrieval method; imprecise data; information retrieval; noisy text retrieval; query evaluations; recall precision analysis; simulated OCR errors; Computer errors; Content based retrieval; Databases; Fuzzy logic; Information retrieval; Information technology; Laboratories; Noise robustness; Optical character recognition software; Query processing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Digital Libraries, 1996. ADL '96., Proceedings of the Third Forum on Research and Technology Advances in
  • Conference_Location
    Washington, DC
  • Print_ISBN
    0-8186-7403-2
  • Type

    conf

  • DOI
    10.1109/ADL.1996.502518
  • Filename
    502518