DocumentCode :
2168590
Title :
A hybrid caseabased and ruleabased for metadata extraction on heterogeneous Thai documents
Author :
Khanaasiaam, Krisda
Author_Institution :
Sch. of Inf., Commun. & Technol., Naresuan Univ. Phayao, Muang Phayao, Thailand
Volume :
1
fYear :
2010
fDate :
26-28 Feb. 2010
Firstpage :
312
Lastpage :
317
Abstract :
This paper reports an experience of humanaassisted process to extract metadata from Thai documents. Nowadays, a number of Thai archives are placed online for sharing increasingly because the Internet infrastructure for global data access is fully functional. However, a large number of Thai archives have documents that lack metadata. The lack of metadata breaks off not only the discovery and dissemination of these documents over the Internet, but also their connectivity with other documents. The manually extracting of these metadata elements is highly laboraintensive, costly and timeaconsuming for a large document then automated is a key idea to solve the problem but the most existing automated metadata extraction approaches have focused on specific domains and homogeneous documents. This paper is proposed a combined casedabased and ruleabased metadata extraction approach to solve these issues. The key idea of solving the heterogeneity is to classify documents into equivalent groups by using ruleabased approach so that each document group contains similar documents only. Next, for each document group the system will be applied caseabase reasoning cycle that contains a process to extract metadata from documents in the group. The system performs the level of precision at 62.31% a 90.78% depending on the characteristic of the data set.
Keywords :
Internet; case-based reasoning; document image processing; meta data; Internet infrastructure; Thai archives; heterogeneous Thai documents; hybrid caseabased; hybrid ruleabased; metadata extaction; Artificial intelligence; Communications technology; Costs; Data mining; Internet; Organizing; Problem-solving; Software libraries; Case-basey Reasoning; Metayata Extraction; Rule-basey Reasoning; Thai Documents;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computer and Automation Engineering (ICCAE), 2010 The 2nd International Conference on
Conference_Location :
Singapore
Print_ISBN :
978-1-4244-5585-0
Electronic_ISBN :
978-1-4244-5586-7
Type :
conf
DOI :
10.1109/ICCAE.2010.5451943
Filename :
5451943
Link To Document :
بازگشت