Title :
Using electronic facsimiles of documents for automatic reconstruction of underlying hypertext structures
Author :
Myka, A. ; Güntzer, U.
Author_Institution :
Wilhelm-Schickard-Inst., Tubingen Univ., Germany
Abstract :
When looking for detailed pieces of information within the facsimiles of documents, a user has only a highly limited set of supportive tools at his disposal. As a solution to this problem, the use of automatically generated hypertext structures is proposed. These structures also include conventional mechanisms like table of contents, index, or full text search, but extend to the possibility of associative searches. The automatic generation of hypertext structures is based on two sources of information: the output of a commercial OCR system and a document type dependent specification file including the specifications for both structure elements and link types. Thus, additional information hidden in layout and typography is taken into account in addition to the plain ASCII representation of the document. The browsing problems that may arise from a hypertext´s nonlinearity do not appear, because the user also has access to the document in its original linear fashion
Keywords :
data structures; document handling; document image processing; hypermedia; information retrieval; optical character recognition; associative searches; automatic generation; automatic reconstruction; automatically generated hypertext structures; browsing problems; commercial OCR system; document type dependent specification file; electronic facsimiles; full text search; link types; plain ASCII representation; structure elements; table of contents; typography; underlying hypertext structures; Facsimile; Hypertext systems; Image reconstruction; Image retrieval; Image storage; Libraries; Optical character recognition software; Prototypes; Space technology; Sun;
Conference_Titel :
Document Analysis and Recognition, 1993., Proceedings of the Second International Conference on
Conference_Location :
Tsukuba Science City
Print_ISBN :
0-8186-4960-7
DOI :
10.1109/ICDAR.1993.395680