Title :
Graphics Classification for Enterprise Knowledge Management
Author :
Djordjevic, Divna ; Ghani, Rayid
Author_Institution :
Accenture Technol. Labs., Sophia Antipolis, France
Abstract :
Enterprise content repositories often consist of business documents comprising not only of traditional text data but also graphics (org charts, graphs, architecture diagrams, etc.) that get reused by people across the enterprise. Despite this diversity of content, most of the research in enterprise search has focused on improving document search. We describe a machine learning approach for graphics classification that automatically classifies graphics within enterprise documents into an enterprise graphics taxonomy and enables graphics search functionality to augment traditional document-centric enterprise search. This allows legacy enterprise documents to be automatically converted into a reusable, tagged, graphics repository. Our approach works by extracting reusable graphics from enterprise documents, performing feature extraction to create textual, visual and structural features that are subsequently used to classify these graphics. We provide experimental results on a real-world data set from Accenture. The contributions of this work are automating the creation of a categorized graphics database for enterprise KM systems, studying the utility of different feature sets, and in demonstrating that existing classification and feature selection methods are appropriate for this task. Finally we describe several applications currently being deployed at Accenture that are enabled by the categorized graphics repository.
Keywords :
computer graphics; feature extraction; image classification; knowledge management; visual databases; Accenture; business document; categorized graphics database; categorized graphics repository; document search; document-centric enterprise search; enterprise content repository; enterprise document; enterprise graphics taxonomy; enterprise knowledge management; feature extraction; graphics classification; graphics search functionality; machine learning; reusable graphics; text data; classification; enterprise search; feature selection; text and graphics analysis;
Conference_Titel :
Data Mining Workshops (ICDMW), 2010 IEEE International Conference on
Conference_Location :
Sydney, NSW
Print_ISBN :
978-1-4244-9244-2
Electronic_ISBN :
978-0-7695-4257-7
DOI :
10.1109/ICDMW.2010.149