Title :
OTCYMIST: Otsu-Canny Minimal Spanning Tree for Born-Digital Images
Author :
Kumar, Deepak ; Ramakrishnan, A.G.
Author_Institution :
Dept. of Electr. Eng., Indian Inst. of Sci., Bangalore, India
Abstract :
Text segmentation and localization algorithms are proposed for the born-digital image dataset. Binarization and edge detection are separately carried out on the three colour planes of the image. Connected components (CC´s) obtained from the binarized image are thresholded based on their area and aspect ratio. CC´s which contain sufficient edge pixels are retained. A novel approach is presented, where the text components are represented as nodes of a graph. Nodes correspond to the centroids of the individual CC´s. Long edges are broken from the minimum spanning tree of the graph. Pair wise height ratio is also used to remove likely non-text components. A new minimum spanning tree is created from the remaining nodes. Horizontal grouping is performed on the CC´s to generate bounding boxes of text strings. Overlapping bounding boxes are removed using an overlap area threshold. Non-overlapping and minimally overlapping bounding boxes are used for text segmentation. Vertical splitting is applied to generate bounding boxes at the word level. The proposed method is applied on all the images of the test dataset and values of precision, recall and H-mean are obtained using different approaches.
Keywords :
edge detection; image colour analysis; image representation; image segmentation; text detection; trees (mathematics); visual databases; OTCYMIST; Otsu-Canny minimal spanning tree; aspect ratio; born-digital image dataset; connected components; edge detection; edge pixels; graph nodes; horizontal grouping; image binarization; image colour planes; image thresholding; minimally overlapping bounding box; nonoverlapping bounding box; nontext component removal; overlapping bounding box removal; pairwise height ratio; text component representation; text localization algorithm; text segmentation algorithm; text strings; vertical splitting; Algorithm design and analysis; Image color analysis; Image edge detection; Image segmentation; Robustness; Training; Vegetation; Binarization; Edge detection; Minimum spanning tree; Text localization; Text segmentation;
Conference_Titel :
Document Analysis Systems (DAS), 2012 10th IAPR International Workshop on
Conference_Location :
Gold Cost, QLD
Print_ISBN :
978-1-4673-0868-7
DOI :
10.1109/DAS.2012.65