DocumentCode :
3485951
Title :
Clustering of Symbols Using Minimal Description Length
Author :
Tataw, Oben M. ; Rakthanmanon, Thanawin ; Keogh, Eamonn J.
Author_Institution :
Univ. of California, Riverside, Riverside, CA, USA
fYear :
2013
fDate :
25-28 Aug. 2013
Firstpage :
180
Lastpage :
184
Abstract :
The clustering of glyphs (individual letters/characters/symbols) is typically the first step in document processing algorithms and a critical enabling technology for most historical document indexing techniques. In this work, we take a step back from current domain/language specialized research efforts to consider the problem from an agnostic perspective. In particular, we claim that, independent of the distance measure used, any method that attempts to cluster all the data is almost certainly doomed to failure. We explain this observation, and introduce a clustering method based on Minimum Description Length (MDL) that can overcome it.
Keywords :
document image processing; image classification; pattern clustering; MDL; document processing algorithms; glyphs clustering; minimal description length; symbol clustering; Accuracy; Algorithm design and analysis; Approximation algorithms; Character recognition; Clustering algorithms; Clustering methods; Encoding; Clustering; Image Similarity; MDL;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Document Analysis and Recognition (ICDAR), 2013 12th International Conference on
Conference_Location :
Washington, DC
ISSN :
1520-5363
Type :
conf
DOI :
10.1109/ICDAR.2013.43
Filename :
6628608
Link To Document :
بازگشت