DocumentCode
1584027
Title
Document filtering for fast approximate string matching of erroneous text
Author
Takasu, Atsuhiro
Author_Institution
Nat. Inst. of Inf., Tokyo, Japan
fYear
2001
fDate
6/23/1905 12:00:00 AM
Firstpage
916
Lastpage
920
Abstract
It is important to utilize retrospective documents. OCR is the most widely applied technology for this purpose; however, error-tolerant methods are essential for utilizing OCR-processed documents. This paper discusses a filtering problem for OCR-processed documents that enables the handling of large numbers of OCR-processed documents in an error-tolerant way. It proposes a systematic index design method for filtering and shows that the filtering method speeds up by about 360 times for a database consisting of about two million records, with little decrease in accuracy
Keywords
document image processing; information retrieval; optical character recognition; string matching; visual databases; OCR; database; document filtering; erroneous text; error tolerant methods; fast approximate string matching; index design method; retrospective documents; Data mining; Document handling; Filtering; Image analysis; Image databases; Informatics; Matched filters; Music information retrieval; Optical character recognition software; Text recognition;
fLanguage
English
Publisher
ieee
Conference_Titel
Document Analysis and Recognition, 2001. Proceedings. Sixth International Conference on
Conference_Location
Seattle, WA
Print_ISBN
0-7695-1263-1
Type
conf
DOI
10.1109/ICDAR.2001.953919
Filename
953919
Link To Document