Title :
Automatic metadata extraction and classification of spreadsheet documents based on layout similarity
Author :
Chatvichienchai, Somchai
Author_Institution :
Dept. of Inf. & Media Studies, Univ. of Nagasaki, Nagasaki, Japan
fDate :
Nov. 29 2011-Dec. 1 2011
Abstract :
Effective information search is becoming a key success for business. Metadata is an essential part of modern information system since it helps people to find relevant documents from disparate repositories. Automatic document metadata extraction has received attention in recent years as it is an important task in generating powerful search indices to support effective information search. The objective of this paper is to propose an innovative method that automatically performs metadata extraction and classification on the spreadsheets having layout similar to that of a given sample spreadsheet whose metadata is previously defined. Metadata classification is based on document types (e.g. purchase order, sales report etc) and data context (e.g. customer name, order date etc) so that users can define the meanings of the keywords in the search query. Therefore, search engine of this work returns the search results that match user search intention more than those of conventional keyword search engines.
Keywords :
classification; document handling; meta data; query processing; search engines; spreadsheet programs; automatic document metadata extraction; information search; layout similarity; metadata classification; search engine; search query; spreadsheet document classification; Crawlers; Data mining; Indexes; Layout; Organizations; Search problems; XML;
Conference_Titel :
Advanced Information Management and Service (ICIPM), 2011 7th International Conference on
Conference_Location :
Jeju
Print_ISBN :
978-1-4577-0471-0