DocumentCode :
2012055
Title :
Collecting Handwritten Nom Character Patterns from Historical Document Pages
Author :
Truyen Van Phan ; Zhu, Bilan ; Nakagawa, Masaki
Author_Institution :
Dept. of Comput. & Inf. Sci., Tokyo Univ. of Agric. & Technol., Tokyo, Japan
fYear :
2012
fDate :
27-29 March 2012
Firstpage :
344
Lastpage :
348
Abstract :
In this paper, we present methods of segmenting Nom historical documents and clustering character patterns to build a Nom character pattern database. Nom is an ideographic script to represent Vietnamese, used from the 10th century to 20th century. However, this heritage is nearly lost. In order to preserve the wisdom and knowledge expressed in Nom, recognition and digitalization are indispensable. Because there is no OCR for Nom yet, we have to start from collecting patterns. We have employed a projection profile based method for segmenting hundreds of pages into individual characters. Then, we have implemented a combination of Chinese OCR-based clustering and K-means clustering to group characters into categories. The experiment shows that the proposed system can help collecting the characters patterns effectively. Moreover, it has revealed that there are many character classes lost or uncategorized so far.
Keywords :
document image processing; handwritten character recognition; history; optical character recognition; pattern clustering; Chinese OCR-based clustering; K-means clustering; Nom character pattern database; Nom historical documents segmentation; Vietnamese; character patterns clustering; handwritten Nom character patterns collection; historical document pages; ideographic script; Accuracy; Character recognition; Databases; Image segmentation; Libraries; Noise; Optical character recognition software; Chu Nom; Han Nom; Vietnamese ancient text; clustering; document image analysis; historical document; offline character database; pattern collection; segmentation;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Document Analysis Systems (DAS), 2012 10th IAPR International Workshop on
Conference_Location :
Gold Cost, QLD
Print_ISBN :
978-1-4673-0868-7
Type :
conf
DOI :
10.1109/DAS.2012.25
Filename :
6195391
Link To Document :
بازگشت