مرکز منطقه ای اطلاع رساني علوم و فناوري - Automatic metadata extraction and classification of spreadsheet documents based on layout similarity

DocumentCode :

564069

Title :

Automatic metadata extraction and classification of spreadsheet documents based on layout similarity

Author :

Chatvichienchai, Somchai

Author_Institution :

Dept. of Inf. & Media Studies, Univ. of Nagasaki, Nagasaki, Japan

fYear :

2011

fDate :

Nov. 29 2011-Dec. 1 2011

Firstpage :

Lastpage :

Abstract :

Effective information search is becoming a key success for business. Metadata is an essential part of modern information system since it helps people to find relevant documents from disparate repositories. Automatic document metadata extraction has received attention in recent years as it is an important task in generating powerful search indices to support effective information search. The objective of this paper is to propose an innovative method that automatically performs metadata extraction and classification on the spreadsheets having layout similar to that of a given sample spreadsheet whose metadata is previously defined. Metadata classification is based on document types (e.g. purchase order, sales report etc) and data context (e.g. customer name, order date etc) so that users can define the meanings of the keywords in the search query. Therefore, search engine of this work returns the search results that match user search intention more than those of conventional keyword search engines.

Keywords :

classification; document handling; meta data; query processing; search engines; spreadsheet programs; automatic document metadata extraction; information search; layout similarity; metadata classification; search engine; search query; spreadsheet document classification; Crawlers; Data mining; Indexes; Layout; Organizations; Search problems; XML;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Advanced Information Management and Service (ICIPM), 2011 7th International Conference on

Conference_Location :

Jeju

Print_ISBN :

978-1-4577-0471-0

Type :

conf

Filename :

6222145

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=564069