DocumentCode :
311135
Title :
(Chem)DeTEX automatic generation of a markup language description of (chemical) documents from bitmap images
Author :
Simon, Aniko ; Pret, Jean-Christophe ; Johnson, A. Peter
Author_Institution :
Sch. of Chem., Leeds Univ., UK
Volume :
1
fYear :
1995
fDate :
14-16 Aug 1995
Firstpage :
458
Abstract :
This paper presents a novel view of document processing, as being the reverse process to TEX. This concept simplifies the analysis of the physical structure of documents, and also suggests the use of a style file for layout recognition. An algorithm is given for both phases, layout analysis and layout recognition. The bottom-up layout analysis method employed is based on the Kruskal´s algorithm and uses the distances between the components to construct the physical page structure. The algorithm is linear with respect to the number of the connected components. For layout recognition, a document style description language (DSDL) is introduced. This helps a fault-tolerant, recursive parsing algorithm to label the blocks of the document. The presented methods were designed to be used for scientific publications (papers, reports, books), but could be applied to a broader range of documents
Keywords :
document handling; page description languages; DSDL; Kruskal´s algorithm; bitmap images; chemical documents; document processing; document style description language; layout analysis; layout recognition; markup language description; physical page structure; recursive parsing algorithm; scientific publications; style file; Algorithm design and analysis; Books; Chemical processes; Chemistry; Computer applications; Fault tolerance; Image converters; Markup languages; Tree data structures; Typesetting;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Document Analysis and Recognition, 1995., Proceedings of the Third International Conference on
Conference_Location :
Montreal, Que.
Print_ISBN :
0-8186-7128-9
Type :
conf
DOI :
10.1109/ICDAR.1995.599035
Filename :
599035
Link To Document :
بازگشت