Title :
Detection and segmentation of table of contents and index pages from document images
Author :
Mandal, S. ; Chowdhury, S.P. ; Das, A.K. ; Chanda, Bhabatosh
Author_Institution :
Dept. of Comput. Sci. & Technol., Bengal Eng. & Sci. Univ., Howrah
Abstract :
Identification and segmentation of the table of contents (TOC) and index pages for the development of a digital library is an obvious task. A digital document library is created to provide a non-labour intensive, cheap and flexible way of storage, representation and management of paper documents in electronic form to facilitate indexing, viewing, printing and extracting the intended portions. Using document image analysis techniques information from the TOC and index pages may be extracted to use in a document database for effective retrieval of the required pieces of information. In this paper, we present fully automatic identification and segmentation of TOC and index pages from scanned documents
Keywords :
digital libraries; document image processing; image segmentation; indexing; information retrieval; digital document library; document database; document image analysis; document images; electronic document storage; fully automatic table of contents detection; fully automatic table of contents identification; index pages detection; index pages identification; index pages segmentation; information extraction; information retrieval; paper document management; paper document representation; paper document storage; scanned documents; table of contents segmentation;
Conference_Titel :
Document Image Analysis for Libraries, 2006. DIAL '06. Second International Conference on
Conference_Location :
Lyon
Print_ISBN :
0-7695-2531-8
DOI :
10.1109/DIAL.2006.13