DocumentCode :
3304790
Title :
Zone classification in a document using the method of feature vector generation
Author :
Sivaramakrishnan, Ramaswamy ; Phillips, Ihsin T. ; Ha, Jaekyu ; Subramanium, Suresh ; Haralick, Robert M.
Author_Institution :
Dept. of Electr. Eng., Washington Univ., Seattle, WA, USA
Volume :
2
fYear :
1995
fDate :
14-16 Aug 1995
Firstpage :
541
Abstract :
A document can be divided into zones on the basis of its content. For example, a zone can be either text or non-text. This paper describes an algorithm to classify each given document zone into one of nine different classes. Features for each zone such as run length mean and variance, spatial mean and variance, fraction of the total number of black pixels in the zone, and the zone width ratio for each zone are extracted. Run length related features are computed along four different canonical directions. A decision tree classifier is used to assign a zone class on the basis of its feature vector. The performance on an independent test set was 97%
Keywords :
decision theory; document image processing; feature extraction; image classification; black pixels; decision tree classifier; document; feature vector; feature vector generation; performance; run length mean; spatial mean; zone classification; zone width ratio; Binary trees; Classification tree analysis; Decision trees; Pattern recognition; Shape; Testing;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Document Analysis and Recognition, 1995., Proceedings of the Third International Conference on
Conference_Location :
Montreal, Que.
Print_ISBN :
0-8186-7128-9
Type :
conf
DOI :
10.1109/ICDAR.1995.601954
Filename :
601954
Link To Document :
بازگشت