DocumentCode
1801167
Title
Robust retrieval of noisy text
Author
Lopresti, Daniel P.
Author_Institution
Matsushita Information Technol. Lab., Panasonic Technols. Inc., Princeton, NJ, USA
fYear
1996
fDate
13-15, May 1996
Firstpage
76
Lastpage
85
Abstract
We examine the effects of simulated OCR errors on Boolean query models for information retrieval. We show that even relatively small amounts of such noise can have a significant impact. To address this issue, we formulate new variants of the traditional models by combining two classic paradigms for dealing with imprecise data: approximate string matching and fuzzy logic. Using a recall/precision analysis of an experiment involving nearly 60 million query evaluations, we demonstrate that the new fuzzy retrieval methods are generally more robust than their “sharp” counterparts
Keywords
Boolean functions; fuzzy logic; information retrieval; optical character recognition; query processing; string matching; Boolean query models; approximate string matching; fuzzy logic; fuzzy retrieval method; imprecise data; information retrieval; noisy text retrieval; query evaluations; recall precision analysis; simulated OCR errors; Computer errors; Content based retrieval; Databases; Fuzzy logic; Information retrieval; Information technology; Laboratories; Noise robustness; Optical character recognition software; Query processing;
fLanguage
English
Publisher
ieee
Conference_Titel
Digital Libraries, 1996. ADL '96., Proceedings of the Third Forum on Research and Technology Advances in
Conference_Location
Washington, DC
Print_ISBN
0-8186-7403-2
Type
conf
DOI
10.1109/ADL.1996.502518
Filename
502518
Link To Document