DocumentCode :
3767812
Title :
Sentence Detection and Extraction in machine printed imaged document using matching technique
Author :
Shalini Puri;Satya Prakash Singh
Author_Institution :
Department of Computer Science, Birla Institute of Technology, Ranchi, India
fYear :
2015
Firstpage :
1
Lastpage :
6
Abstract :
Sentence extraction is a new, challenging and critical step in the printed scanned imaged documents. In this paper, an efficient 4-layered Sentence Detection and Extraction System (SDES) model is proposed which is designed to detect and extract sentences from machine printed imaged document. Its internal details and architecture clearly show that how it processes an image to find out the underlying sentences. The basic idea is to first preprocess the imaged document for noise removal and skew correction, and then textual entities are detected and segmented at page, line and word levels. Firstly, the horizontal and vertical projection profiles are taken to segment and separate the lines and words. After skew correction, two stage Character Based and Word Based Leveled matching and testing are performed, which verify and identify the correct character and word by searching for similar textual characters and words in Character Set Storage (CSS) and Word Pseudo Thesaurus (WPT). If any word pattern is not matched and identified by WPT, then it is stored in the Unmatched Word Storage (UWS) for the future reference. Such testing and verification are used at two levels to increase the accuracy% of SDES, and thereby, reducing the errors. It increases the system performance greatly. Finally, all the sentences of imaged document are extracted. Experimental results are found at the word, character and sentence levels. Their accuracy% results are good which show the high system performance and efficiency.
Keywords :
"Image segmentation","Feature extraction","Layout","Data mining","Testing","Cascading style sheets","Hidden Markov models"
Publisher :
ieee
Conference_Titel :
Recent Advances in Engineering & Computational Sciences (RAECS), 2015 2nd International Conference on
Type :
conf
DOI :
10.1109/RAECS.2015.7453382
Filename :
7453382
Link To Document :
بازگشت