DocumentCode :
3486222
Title :
Cross-Language Sensitive Words Distribution Map: A Novel Recognition-Based Document Understanding Method for Uighur and Tibetan
Author :
Bing Su ; Xiaoqing Ding ; Liangrui Peng ; Changsong Liu
Author_Institution :
Dept. of Electron. Eng., Tsinghua Univ., Beijing, China
fYear :
2013
fDate :
25-28 Aug. 2013
Firstpage :
255
Lastpage :
259
Abstract :
Cross-language document recognition and understanding have urgent realistic needs and extensive application prospects. In this paper, we propose a novel recognition-based Uighur and Tibetan document understanding method, termed "cross-language sensitive words distribution map" (CSWDM). In our unified recognition-understanding framework, digital Uighur/Tibetan document images are first recognized using OCR technology, and then CSWDM labels the Chinese information of sensitive words on the recognized transcriptions or directly on the original digital images, thus the space location and occurrence frequency of these sensitive words can be intuitively represented. With such information, readers can roughly understand the theme and meaning of the cross-language documents.
Keywords :
document image processing; image recognition; natural language processing; word processing; CSWDM labels; Chinese information; OCR technology; cross-language documents; cross-language sensitive word distribution map; digital Tibetan document images; digital Uighur document images; occurrence frequency; recognition-based Tibetan document understanding method; recognition-based Uighur document understanding method; space location; transcription recognition; unified recognition-understanding framework; Character recognition; Databases; Image recognition; Optical character recognition software; Tagging; Text recognition; Tibetan; Uighur; character recognition; cross-language sensitive words distribution map; document understanding; ethnic language;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Document Analysis and Recognition (ICDAR), 2013 12th International Conference on
Conference_Location :
Washington, DC
ISSN :
1520-5363
Type :
conf
DOI :
10.1109/ICDAR.2013.58
Filename :
6628623
Link To Document :
بازگشت