DocumentCode :
1216285
Title :
Stochastic language models for style-directed layout analysis of document images
Author :
Kanungo, Tapas ; Mao, Song
Author_Institution :
IBM Almaden Res. Center, San Jose, CA, USA
Volume :
12
Issue :
5
fYear :
2003
fDate :
5/1/2003 12:00:00 AM
Firstpage :
583
Lastpage :
596
Abstract :
Image segmentation is an important component of any document image analysis system. While many segmentation algorithms exist in the literature, very few i) allow users to specify the physical style, and ii) incorporate user-specified style information into the algorithm´s objective function that is to be minimized. We describe a segmentation algorithm that models a document´s physical structure as a hierarchical structure where each node describes a region of the document using a stochastic regular grammar. The exact form of the hierarchy and the stochastic language is specified by the user, while the probabilities associated with the transitions are estimated from groundtruth data. We demonstrate the segmentation algorithm on images of bilingual dictionaries.
Keywords :
document image processing; grammars; image segmentation; stochastic processes; bilingual dictionaries; document image analysis system; document images; hierarchical structure; hierarchy; image segmentation; objective function; physical style; stochastic language models; stochastic regular grammar; style-directed layout analysis; user-specified style information; Algorithm design and analysis; Dictionaries; Hidden Markov models; Image analysis; Image segmentation; Information retrieval; Natural languages; Speech recognition; Stochastic processes; Text analysis;
fLanguage :
English
Journal_Title :
Image Processing, IEEE Transactions on
Publisher :
ieee
ISSN :
1057-7149
Type :
jour
DOI :
10.1109/TIP.2003.811487
Filename :
1203151
Link To Document :
بازگشت