DocumentCode :
1584027
Title :
Document filtering for fast approximate string matching of erroneous text
Author :
Takasu, Atsuhiro
Author_Institution :
Nat. Inst. of Inf., Tokyo, Japan
fYear :
2001
fDate :
6/23/1905 12:00:00 AM
Firstpage :
916
Lastpage :
920
Abstract :
It is important to utilize retrospective documents. OCR is the most widely applied technology for this purpose; however, error-tolerant methods are essential for utilizing OCR-processed documents. This paper discusses a filtering problem for OCR-processed documents that enables the handling of large numbers of OCR-processed documents in an error-tolerant way. It proposes a systematic index design method for filtering and shows that the filtering method speeds up by about 360 times for a database consisting of about two million records, with little decrease in accuracy
Keywords :
document image processing; information retrieval; optical character recognition; string matching; visual databases; OCR; database; document filtering; erroneous text; error tolerant methods; fast approximate string matching; index design method; retrospective documents; Data mining; Document handling; Filtering; Image analysis; Image databases; Informatics; Matched filters; Music information retrieval; Optical character recognition software; Text recognition;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Document Analysis and Recognition, 2001. Proceedings. Sixth International Conference on
Conference_Location :
Seattle, WA
Print_ISBN :
0-7695-1263-1
Type :
conf
DOI :
10.1109/ICDAR.2001.953919
Filename :
953919
Link To Document :
بازگشت