• DocumentCode
    1584027
  • Title

    Document filtering for fast approximate string matching of erroneous text

  • Author

    Takasu, Atsuhiro

  • Author_Institution
    Nat. Inst. of Inf., Tokyo, Japan
  • fYear
    2001
  • fDate
    6/23/1905 12:00:00 AM
  • Firstpage
    916
  • Lastpage
    920
  • Abstract
    It is important to utilize retrospective documents. OCR is the most widely applied technology for this purpose; however, error-tolerant methods are essential for utilizing OCR-processed documents. This paper discusses a filtering problem for OCR-processed documents that enables the handling of large numbers of OCR-processed documents in an error-tolerant way. It proposes a systematic index design method for filtering and shows that the filtering method speeds up by about 360 times for a database consisting of about two million records, with little decrease in accuracy
  • Keywords
    document image processing; information retrieval; optical character recognition; string matching; visual databases; OCR; database; document filtering; erroneous text; error tolerant methods; fast approximate string matching; index design method; retrospective documents; Data mining; Document handling; Filtering; Image analysis; Image databases; Informatics; Matched filters; Music information retrieval; Optical character recognition software; Text recognition;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Document Analysis and Recognition, 2001. Proceedings. Sixth International Conference on
  • Conference_Location
    Seattle, WA
  • Print_ISBN
    0-7695-1263-1
  • Type

    conf

  • DOI
    10.1109/ICDAR.2001.953919
  • Filename
    953919