• DocumentCode
    1632782
  • Title

    Enhanced Text Extraction from Arabic Degraded Document Images Using EM Algorithm

  • Author

    Boussellaa, Wafa ; Bougacha, Aymen ; Zahour, Abderrazak ; El Abed, Haikal ; Alimi, Adel

  • Author_Institution
    ENIS, Univ. of Sfax, Sfax, Tunisia
  • fYear
    2009
  • Firstpage
    743
  • Lastpage
    747
  • Abstract
    This paper presents a new enhanced text extraction algorithm from degraded document images on the basis of the probabilistic models. The observed document image is considered as a mixture of Gaussian densities which represents the foreground and background document image components. The EM algorithm is introduced in order to estimate and improve the parameters of the mixtures of densities recursively. The initial parameters of the EM algorithm are estimated by the k-means clustering method. After the parameter estimation, the document image is partitioned into text and background classes by the means of ML approach. The performance of the proposed approach is evaluated on a variety of degraded documents comes from the collections of the National library of Tunisia.
  • Keywords
    Gaussian processes; document image processing; expectation-maximisation algorithm; image representation; image segmentation; maximum likelihood detection; natural language processing; parameter estimation; probability; text analysis; Arabic degraded document image; EM algorithm; Gaussian mixture; ML algorithm; National library of Tunisia; enhanced text extraction algorithm; image representation; k-means clustering method; parameter estimation; probabilistic model; Algorithm design and analysis; Clustering algorithms; Clustering methods; Degradation; Image enhancement; Image segmentation; Maximum likelihood estimation; Parameter estimation; Partitioning algorithms; Text analysis; Arabic degraded document image; Maximum likelihood algorithm(ML); expectation-maximisation algorithm (EM); k-means clustering; segmentation;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Document Analysis and Recognition, 2009. ICDAR '09. 10th International Conference on
  • Conference_Location
    Barcelona
  • ISSN
    1520-5363
  • Print_ISBN
    978-1-4244-4500-4
  • Electronic_ISBN
    1520-5363
  • Type

    conf

  • DOI
    10.1109/ICDAR.2009.220
  • Filename
    5277497