Author_Institution :
Dept. of Comput. Sci., California Univ., Berkeley, CA, USA
Abstract :
The electronic representation of scientific documents (journals, technical reports, program documentation, laboratory notebooks, etc.) presents challenges in several distinct communities. We see five distinct groups who are concerned with electronic versions of scientific documents: (1) publishers of journals, texts and reference works, and their authors; (2) software publishers for OCR/document analysis and document formatting; (3) software publishers whose products access “contents semantics” from documents, including library keyword search programs, natural language search programs, database systems, visual presentation systems, mathematical computation systems, etc.; (4) institutions maintaining access to electronic libraries, which must be broadly construed to include data and programs of all sorts; and (5) individuals and programs acting as their agents who need to use these libraries to identify, locate and retrieve relevant documents. It would be good to have a convergence in design and standards for encoding new or pre-existing (typically paper-based) documents in order to meet the needs of all these groups. Various efforts, some loosely coordinated, but just as often competing, are trying to set standards and build tools. This paper discusses where we are headed
Keywords :
electronic publishing; OCR; authors; contents semantics; database systems; document analysis; document formatting; electronic library access; electronic representation; journal publishers; library keyword search programs; mathematical computation systems; natural language search programs; paper-based documents; reference works; scientific documents; software publishers; standards; visual presentation systems; Database systems; Documentation; Information retrieval; Keyword search; Natural languages; Optical character recognition software; Software libraries; Software maintenance; Text analysis; Writing;