DocumentCode :
2146174
Title :
Chinese Keyword Spotting Using Knowledge-Based Clustering
Author :
Xia, Yong ; Wang, Kuanquan ; Li, Mingwei
Author_Institution :
Sch. of Comput. Sci. & Technol., Harbin Inst. of Technol., Harbin, China
fYear :
2011
fDate :
18-21 Sept. 2011
Firstpage :
789
Lastpage :
793
Abstract :
Content-based document image retrieval is a new and promising research area. Without OCR, document indexing directly based on image content is more general and convenient. However content-based Chinese document retrieval is difficult for the complexity of Chinese character structure and large class numbers. Few papers cover this issue, and this paper will focus on it. This paper presents a novel algorithm of knowledge-based clustering and gives a mechanism of serial batch clustering for large data set. Knowledge derives from an artificial document image collection. Chinese characters with high frequency are edited and synthesized to images automatically. Cluster IDs are adopted to index the characters. A Dream of Red Mansions, a famous classical Chinese literature work including near one million characters, is used to evaluate the performance of Chinese keyword spotting. Experimental results confirm the effectiveness of knowledge-based clustering and its application on Chinese keyword spotting.
Keywords :
content-based retrieval; document image processing; image retrieval; knowledge based systems; natural language processing; pattern clustering; Chinese character structure complexity; Chinese keyword spotting; Chinese literature work; Dream of Red Mansions; OCR; content based document image retrieval; document indexing; knowledge based clustering; serial batch clustering; Clustering algorithms; Complexity theory; Feature extraction; Image retrieval; Indexing; Knowledge based systems; Optical character recognition software; Content-based Chinese keyword spotting; Document image synthesis; Knowledge-based clustering; Serial batch clustering;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Document Analysis and Recognition (ICDAR), 2011 International Conference on
Conference_Location :
Beijing
ISSN :
1520-5363
Print_ISBN :
978-1-4577-1350-7
Electronic_ISBN :
1520-5363
Type :
conf
DOI :
10.1109/ICDAR.2011.162
Filename :
6065419
Link To Document :
بازگشت