• DocumentCode
    2192913
  • Title

    An approximate string match for garbled text with various accuracy

  • Author

    Takasu, Atsuhiro

  • Author_Institution
    Dept. of Res. & Dev., Nat. Center for Sci. Inf. Syst., Tokyo, Japan
  • Volume
    2
  • fYear
    1997
  • fDate
    18-20 Aug 1997
  • Firstpage
    957
  • Abstract
    This paper presents a fast approximate string matching method. In constructing information spaces such as digital libraries, we have to collect vast amount of information and convert it into uniformly organized data. Since much of the information must be converted from various media automatically, the space contains garbled text with various accuracy. For utilizing these texts, we need to satisfy the three requirements, i.e., high recall, high precision and fast matching process. In order to satisfy these requirements, we have been developing a two-phase matching system. The presented method is used for fast and high recall candidate word selection in the first phase. The key idea of the method is to use a portion of characters of a word and a distance pattern in order to use current index techniques. By experiments, we confirm that the presented method achieves high recall even for the poorly recognized texts
  • Keywords
    character recognition; document handling; information retrieval; string matching; accuracy; approximate string matching; candidate word selection; digital libraries; garbled text; high recall; index techniques; two-phase matching system; Computer errors; Databases; Extraterrestrial measurements; Information retrieval; Information systems; Software libraries; Space technology; Speech; Text recognition; Video on demand;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Document Analysis and Recognition, 1997., Proceedings of the Fourth International Conference on
  • Conference_Location
    Ulm
  • Print_ISBN
    0-8186-7898-4
  • Type

    conf

  • DOI
    10.1109/ICDAR.1997.620652
  • Filename
    620652