DocumentCode :
3488489
Title :
A Pair-Copula Based Scheme for Text Extraction from Digital Images
Author :
Roy, Anirban ; Parui, Swapan K. ; Roy, Utpal
Author_Institution :
CVPR Unit, Indian Stat. Inst., Kolkata, India
fYear :
2013
fDate :
25-28 Aug. 2013
Firstpage :
892
Lastpage :
896
Abstract :
This paper presents a statistical model based scheme for automatic extraction of text components from digital images. The work is composed of two tasks. First, we perform segmentation of a color image by applying a pair-copula based mixture model. This produces a number of spatially connected components (some of which may be text). From each of these components, we extract certain features that could discriminate text from non-text components. The feature vectors, arising from text components, are assumed to be random samples from a pair-copula based multivariate distribution. This distribution parameters can be estimated using training text samples (i.e., connected components). Here, we use the ICDAR 2011 "Born-Digital Images\´\´ data set since it provides such ground truth text components. We estimate distribution parameters based on the feature vectors obtained from these training text components. The final task remain is to infer whether a test sample is a text component. We apply a non-parametric statistical hypothesis testing to assess whether a test sample is generated from the known multivariate distribution. If so, we may regard the sample to be a text. Our results obtained on ICDAR 2011 "Born-Digital Images\´\´ data set, are satisfactory.
Keywords :
feature extraction; image colour analysis; image segmentation; statistical testing; text detection; ICDAR 2011; automatic extraction; color image segmentation; connected components; digital images; feature vectors; nonparametric statistical hypothesis testing; pair-copula based multivariate distribution; pair-copula based scheme; statistical model; text components; text extraction; Digital images; Distribution functions; Feature extraction; Image color analysis; Image segmentation; Training; Vectors;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Document Analysis and Recognition (ICDAR), 2013 12th International Conference on
Conference_Location :
Washington, DC
ISSN :
1520-5363
Type :
conf
DOI :
10.1109/ICDAR.2013.182
Filename :
6628747
Link To Document :
بازگشت