• DocumentCode
    3485951
  • Title

    Clustering of Symbols Using Minimal Description Length

  • Author

    Tataw, Oben M. ; Rakthanmanon, Thanawin ; Keogh, Eamonn J.

  • Author_Institution
    Univ. of California, Riverside, Riverside, CA, USA
  • fYear
    2013
  • fDate
    25-28 Aug. 2013
  • Firstpage
    180
  • Lastpage
    184
  • Abstract
    The clustering of glyphs (individual letters/characters/symbols) is typically the first step in document processing algorithms and a critical enabling technology for most historical document indexing techniques. In this work, we take a step back from current domain/language specialized research efforts to consider the problem from an agnostic perspective. In particular, we claim that, independent of the distance measure used, any method that attempts to cluster all the data is almost certainly doomed to failure. We explain this observation, and introduce a clustering method based on Minimum Description Length (MDL) that can overcome it.
  • Keywords
    document image processing; image classification; pattern clustering; MDL; document processing algorithms; glyphs clustering; minimal description length; symbol clustering; Accuracy; Algorithm design and analysis; Approximation algorithms; Character recognition; Clustering algorithms; Clustering methods; Encoding; Clustering; Image Similarity; MDL;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Document Analysis and Recognition (ICDAR), 2013 12th International Conference on
  • Conference_Location
    Washington, DC
  • ISSN
    1520-5363
  • Type

    conf

  • DOI
    10.1109/ICDAR.2013.43
  • Filename
    6628608