• DocumentCode
    2595819
  • Title

    Classification of Distributed Data Using Topic Modeling and Maximum Variation Sampling

  • Author

    Patton, Robert M. ; Beaver, Justin M. ; Potok, Thomas E.

  • fYear
    2011
  • fDate
    4-7 Jan. 2011
  • Firstpage
    1
  • Lastpage
    5
  • Abstract
    From a management perspective, understanding the information that exists on a network and how it is distributed provides a critical advantage. This work explores the use of topic modeling as an approach to automatically determine the classes of information that exist on an organization´s network, and then use the resultant topics as centroid vectors for the classification of individual documents in order to understand the distribution of information topics across the enterprise network. The approach is tested using the 20 Newsgroups dataset.
  • Keywords
    business data processing; distributed processing; document handling; pattern classification; sampling methods; centroid vector; distributed data classification; document classification; enterprise network; information topic; maximum variation sampling; network information; organization network; topic modeling; Classification algorithms; Computational modeling; Distributed databases; Entropy; Equations; Runtime; Training;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    System Sciences (HICSS), 2011 44th Hawaii International Conference on
  • Conference_Location
    Kauai, HI
  • ISSN
    1530-1605
  • Print_ISBN
    978-1-4244-9618-1
  • Type

    conf

  • DOI
    10.1109/HICSS.2011.101
  • Filename
    5718857