DocumentCode
2530052
Title
Document digitization technology and its application for digital library in China
Author
Ding, Xiaoqing ; Wen, Di ; Peng, Liangrui ; Liu, Changsong
Author_Institution
Dept. of Electron. Eng., Tsinghua Univ., Beijing, China
fYear
2004
fDate
2004
Firstpage
46
Lastpage
53
Abstract
We introduce the research of document digitization technology and its applications for constructing digital libraries in China. We focus on two major objectives of document digitization technologies: performance and efficiency. Taking the most representative TH-OCR product as an example, the up-to-date research achievements on both kernel OCR technologies and peripheral technologies in China are presented. The kernel technologies include high performance multilingual (Chinese, Japanese, Korean and English) text recognition, layout analysis, understanding and reconstruction; the peripheral technologies include the network document digitization workflow and intelligent proofreading, which greatly improve the efficiency. The applications of TH-OCR has two types of final output digital documents, one is the reconstructed electronic document with full text and layout information of the original paper-based document, the other is the multilevel document with OCR output text layer under the image layer. Numerous applications indicate that current technologies can greatly facilitate the mass-volume digitization labour in building digital library infrastructure.
Keywords
digital libraries; document image processing; optical character recognition; text analysis; TH-OCR product; digital library; document digitization technology; electronic document; intelligent proofreading; kernel OCR technology; layout analysis; mass-volume digitization labour; multilingual character recognition; network document digitization workflow; paper-based document; peripheral technology; text recognition; Automation; Books; Character recognition; Humans; Image reconstruction; Intelligent networks; Kernel; Laboratories; Optical character recognition software; Software libraries;
fLanguage
English
Publisher
ieee
Conference_Titel
Document Image Analysis for Libraries, 2004. Proceedings. First International Workshop on
Print_ISBN
0-7695-2088-X
Type
conf
DOI
10.1109/DIAL.2004.1263236
Filename
1263236
Link To Document