DocumentCode :
3340098
Title :
Feature Extraction for Document Image Segmentation by pLSA Model
Author :
Yamaguchi, Takuma ; Maruyama, Minoru
Author_Institution :
Dept. of Inf. Eng., Shinshu Univ., Nagano
fYear :
2008
fDate :
16-19 Sept. 2008
Firstpage :
53
Lastpage :
60
Abstract :
In this paper, we propose a method for document image segmentation based on pLSA (probabilistic latent semantic analysis) model. The pLSA model is originally developed for topic discovery in text analysis using "bag-of-words" document representation. The model is useful for image analysis by "bag-of-visual words" image representation. The performance of the method depends on the visual vocabulary generated by feature extraction from the document image. We compare several feature extraction and description methods, and examine the relations to segmentation performance. Through the experiments, we show accurate content-based document segmentation is made possible by using pLSA-based method.
Keywords :
document image processing; feature extraction; image representation; image segmentation; text analysis; document image segmentation; document representation; feature extraction; image representation; probabilistic latent semantic analysis; text analysis; topic discovery; visual vocabulary; Engines; Feature extraction; Graphical models; Image analysis; Image representation; Image segmentation; Information analysis; Optical character recognition software; Text analysis; Vocabulary; document image segmentation; feature extraction; topic model;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Document Analysis Systems, 2008. DAS '08. The Eighth IAPR International Workshop on
Conference_Location :
Nara
Print_ISBN :
978-0-7695-3337-7
Type :
conf
DOI :
10.1109/DAS.2008.48
Filename :
4669945
Link To Document :
بازگشت