• DocumentCode
    3437655
  • Title

    Characterization of Corpora from Enterprise Technology Creation for Retrieval and Mining

  • Author

    Deolalikar, Vinay

  • Author_Institution
    HP-Autonomy Res., Sunnyvale, CA, USA
  • fYear
    2013
  • fDate
    7-10 Dec. 2013
  • Firstpage
    365
  • Lastpage
    369
  • Abstract
    Enterprise information management (EIM) deals with the demands upon enterprise unstructured information placed by applications such as eDiscovery, compliance, information lifecycle management, etc. Each of these applications poses a unique challenge to the retrieval and data mining of enterprise unstructured information. However, the study of EIM as a research field has long been hampered by the lack of availability of enterprise corpora. Due to this paucity of enterprise datasets, much of the research on information retrieval and mining that is meant for EIM is benchmarked on corpora that are vastly different in their structure than a typical enterprise corpus. An important category of enterprise corpora are those that arise during technology creation in an enterprise. Such corpora take center stage, for example, during eDiscovery requests arising from technology patent related litigation: an area of immense commercial impact. In this paper, we highlight the primary characteristics of enterprise corpora that arise during technology creation. At a high-level, these properties are project structure, heterogeneity, collaborations, and skew ness along various axes. We then study these characteristics in a carefully chosen enterprise corpus from a technology creation effort at a Fortune 10 corporation. In summary, we study the salient features of enterprise corpora and emphasize their structural properties. We hope that our study will spur effort in devising retrieval and mining techniques that are designed for EIM.
  • Keywords
    business data processing; data mining; information retrieval; EIM; corpora characterization; enterprise information management; enterprise technology creation; enterprise unstructured information data mining; enterprise unstructured information retrieval; Aggregates; Benchmark testing; Collaboration; Data mining; Information management; Information retrieval; Organizations; Enterprise Corpora; Enterprise Information Management; Technology Creation;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Mining Workshops (ICDMW), 2013 IEEE 13th International Conference on
  • Conference_Location
    Dallas, TX
  • Print_ISBN
    978-1-4799-3143-9
  • Type

    conf

  • DOI
    10.1109/ICDMW.2013.62
  • Filename
    6753943