DocumentCode
3485951
Title
Clustering of Symbols Using Minimal Description Length
Author
Tataw, Oben M. ; Rakthanmanon, Thanawin ; Keogh, Eamonn J.
Author_Institution
Univ. of California, Riverside, Riverside, CA, USA
fYear
2013
fDate
25-28 Aug. 2013
Firstpage
180
Lastpage
184
Abstract
The clustering of glyphs (individual letters/characters/symbols) is typically the first step in document processing algorithms and a critical enabling technology for most historical document indexing techniques. In this work, we take a step back from current domain/language specialized research efforts to consider the problem from an agnostic perspective. In particular, we claim that, independent of the distance measure used, any method that attempts to cluster all the data is almost certainly doomed to failure. We explain this observation, and introduce a clustering method based on Minimum Description Length (MDL) that can overcome it.
Keywords
document image processing; image classification; pattern clustering; MDL; document processing algorithms; glyphs clustering; minimal description length; symbol clustering; Accuracy; Algorithm design and analysis; Approximation algorithms; Character recognition; Clustering algorithms; Clustering methods; Encoding; Clustering; Image Similarity; MDL;
fLanguage
English
Publisher
ieee
Conference_Titel
Document Analysis and Recognition (ICDAR), 2013 12th International Conference on
Conference_Location
Washington, DC
ISSN
1520-5363
Type
conf
DOI
10.1109/ICDAR.2013.43
Filename
6628608
Link To Document