• DocumentCode
    2012222
  • Title

    OTCYMIST: Otsu-Canny Minimal Spanning Tree for Born-Digital Images

  • Author

    Kumar, Deepak ; Ramakrishnan, A.G.

  • Author_Institution
    Dept. of Electr. Eng., Indian Inst. of Sci., Bangalore, India
  • fYear
    2012
  • fDate
    27-29 March 2012
  • Firstpage
    389
  • Lastpage
    393
  • Abstract
    Text segmentation and localization algorithms are proposed for the born-digital image dataset. Binarization and edge detection are separately carried out on the three colour planes of the image. Connected components (CC´s) obtained from the binarized image are thresholded based on their area and aspect ratio. CC´s which contain sufficient edge pixels are retained. A novel approach is presented, where the text components are represented as nodes of a graph. Nodes correspond to the centroids of the individual CC´s. Long edges are broken from the minimum spanning tree of the graph. Pair wise height ratio is also used to remove likely non-text components. A new minimum spanning tree is created from the remaining nodes. Horizontal grouping is performed on the CC´s to generate bounding boxes of text strings. Overlapping bounding boxes are removed using an overlap area threshold. Non-overlapping and minimally overlapping bounding boxes are used for text segmentation. Vertical splitting is applied to generate bounding boxes at the word level. The proposed method is applied on all the images of the test dataset and values of precision, recall and H-mean are obtained using different approaches.
  • Keywords
    edge detection; image colour analysis; image representation; image segmentation; text detection; trees (mathematics); visual databases; OTCYMIST; Otsu-Canny minimal spanning tree; aspect ratio; born-digital image dataset; connected components; edge detection; edge pixels; graph nodes; horizontal grouping; image binarization; image colour planes; image thresholding; minimally overlapping bounding box; nonoverlapping bounding box; nontext component removal; overlapping bounding box removal; pairwise height ratio; text component representation; text localization algorithm; text segmentation algorithm; text strings; vertical splitting; Algorithm design and analysis; Image color analysis; Image edge detection; Image segmentation; Robustness; Training; Vegetation; Binarization; Edge detection; Minimum spanning tree; Text localization; Text segmentation;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Document Analysis Systems (DAS), 2012 10th IAPR International Workshop on
  • Conference_Location
    Gold Cost, QLD
  • Print_ISBN
    978-1-4673-0868-7
  • Type

    conf

  • DOI
    10.1109/DAS.2012.65
  • Filename
    6195400