Title :
Document filtering for fast approximate string matching of erroneous text
Author :
Takasu, Atsuhiro
Author_Institution :
Nat. Inst. of Inf., Tokyo, Japan
fDate :
6/23/1905 12:00:00 AM
Abstract :
It is important to utilize retrospective documents. OCR is the most widely applied technology for this purpose; however, error-tolerant methods are essential for utilizing OCR-processed documents. This paper discusses a filtering problem for OCR-processed documents that enables the handling of large numbers of OCR-processed documents in an error-tolerant way. It proposes a systematic index design method for filtering and shows that the filtering method speeds up by about 360 times for a database consisting of about two million records, with little decrease in accuracy
Keywords :
document image processing; information retrieval; optical character recognition; string matching; visual databases; OCR; database; document filtering; erroneous text; error tolerant methods; fast approximate string matching; index design method; retrospective documents; Data mining; Document handling; Filtering; Image analysis; Image databases; Informatics; Matched filters; Music information retrieval; Optical character recognition software; Text recognition;
Conference_Titel :
Document Analysis and Recognition, 2001. Proceedings. Sixth International Conference on
Conference_Location :
Seattle, WA
Print_ISBN :
0-7695-1263-1
DOI :
10.1109/ICDAR.2001.953919