Title :
Cross-Language Sensitive Words Distribution Map: A Novel Recognition-Based Document Understanding Method for Uighur and Tibetan
Author :
Bing Su ; Xiaoqing Ding ; Liangrui Peng ; Changsong Liu
Author_Institution :
Dept. of Electron. Eng., Tsinghua Univ., Beijing, China
Abstract :
Cross-language document recognition and understanding have urgent realistic needs and extensive application prospects. In this paper, we propose a novel recognition-based Uighur and Tibetan document understanding method, termed "cross-language sensitive words distribution map" (CSWDM). In our unified recognition-understanding framework, digital Uighur/Tibetan document images are first recognized using OCR technology, and then CSWDM labels the Chinese information of sensitive words on the recognized transcriptions or directly on the original digital images, thus the space location and occurrence frequency of these sensitive words can be intuitively represented. With such information, readers can roughly understand the theme and meaning of the cross-language documents.
Keywords :
document image processing; image recognition; natural language processing; word processing; CSWDM labels; Chinese information; OCR technology; cross-language documents; cross-language sensitive word distribution map; digital Tibetan document images; digital Uighur document images; occurrence frequency; recognition-based Tibetan document understanding method; recognition-based Uighur document understanding method; space location; transcription recognition; unified recognition-understanding framework; Character recognition; Databases; Image recognition; Optical character recognition software; Tagging; Text recognition; Tibetan; Uighur; character recognition; cross-language sensitive words distribution map; document understanding; ethnic language;
Conference_Titel :
Document Analysis and Recognition (ICDAR), 2013 12th International Conference on
Conference_Location :
Washington, DC
DOI :
10.1109/ICDAR.2013.58