• DocumentCode
    2192310
  • Title

    Graphics Classification for Enterprise Knowledge Management

  • Author

    Djordjevic, Divna ; Ghani, Rayid

  • Author_Institution
    Accenture Technol. Labs., Sophia Antipolis, France
  • fYear
    2010
  • fDate
    13-13 Dec. 2010
  • Firstpage
    562
  • Lastpage
    569
  • Abstract
    Enterprise content repositories often consist of business documents comprising not only of traditional text data but also graphics (org charts, graphs, architecture diagrams, etc.) that get reused by people across the enterprise. Despite this diversity of content, most of the research in enterprise search has focused on improving document search. We describe a machine learning approach for graphics classification that automatically classifies graphics within enterprise documents into an enterprise graphics taxonomy and enables graphics search functionality to augment traditional document-centric enterprise search. This allows legacy enterprise documents to be automatically converted into a reusable, tagged, graphics repository. Our approach works by extracting reusable graphics from enterprise documents, performing feature extraction to create textual, visual and structural features that are subsequently used to classify these graphics. We provide experimental results on a real-world data set from Accenture. The contributions of this work are automating the creation of a categorized graphics database for enterprise KM systems, studying the utility of different feature sets, and in demonstrating that existing classification and feature selection methods are appropriate for this task. Finally we describe several applications currently being deployed at Accenture that are enabled by the categorized graphics repository.
  • Keywords
    computer graphics; feature extraction; image classification; knowledge management; visual databases; Accenture; business document; categorized graphics database; categorized graphics repository; document search; document-centric enterprise search; enterprise content repository; enterprise document; enterprise graphics taxonomy; enterprise knowledge management; feature extraction; graphics classification; graphics search functionality; machine learning; reusable graphics; text data; classification; enterprise search; feature selection; text and graphics analysis;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Mining Workshops (ICDMW), 2010 IEEE International Conference on
  • Conference_Location
    Sydney, NSW
  • Print_ISBN
    978-1-4244-9244-2
  • Electronic_ISBN
    978-0-7695-4257-7
  • Type

    conf

  • DOI
    10.1109/ICDMW.2010.149
  • Filename
    5693347