Title :
Handwritten text line extraction based on minimum spanning tree clustering
Author :
Yin, Fei ; Liu, Cheng-Lin
Author_Institution :
Inst. of Autom., Chinese Acad. of Sci., Beijing
Abstract :
Text line extraction from unconstrained handwritten documents is a challenge because the text lines are often skewed and curved and the space between lines is not obvious. To solve this problem, we propose an approach based on minimum spanning tree (MST) clustering with new distance measures. First, the connected components of the document image are grouped into a tree by MST clustering with a new distance measure. The edges of the tree are then dynamically cut to form text lines by using a new objective function for finding the number of clusters. This approach is totally parameter-free and can apply to various documents with multi-skewed and curved lines. Experiments on handwritten Chinese documents demonstrate the effectiveness of the approach.
Keywords :
document image processing; feature extraction; handwritten character recognition; pattern clustering; trees (mathematics); handwritten text line extraction; minimum spanning tree clustering; unconstrained handwritten document image; Character recognition; Notice of Violation; Optical character recognition software; Pattern analysis; Pattern recognition; Performance analysis; Pixel; Strips; Text analysis; Wavelet analysis; Connected component labeling; Handwritten text line extraction; MST clustering; Multi-skewed document; OCR;
Conference_Titel :
Wavelet Analysis and Pattern Recognition, 2007. ICWAPR '07. International Conference on
Conference_Location :
Beijing
Print_ISBN :
978-1-4244-1065-1
Electronic_ISBN :
978-1-4244-1066-8
DOI :
10.1109/ICWAPR.2007.4421601