• DocumentCode
    3250433
  • Title

    Distributed online Big Data classification using context information

  • Author

    Tekin, Cem ; Van der Schaar, Mihaela

  • Author_Institution
    Dept. of Electr. Eng., Univ. of California, Los Angeles, Los Angeles, CA, USA
  • fYear
    2013
  • fDate
    2-4 Oct. 2013
  • Firstpage
    1435
  • Lastpage
    1442
  • Abstract
    Distributed, online data mining systems have emerged as a result of applications requiring analysis of large amounts of correlated and high-dimensional data produced by multiple distributed data sources. We propose a distributed online data classification framework where data is gathered by distributed data sources and processed by a heterogeneous set of distributed learners which learn online, at run-time, how to classify the different data streams either by using their locally available classification functions or by helping each other by classifying each other´s data. Importantly, since the data is gathered at different locations, sending the data to another learner to process incurs additional costs such as delays, and hence this will be only beneficial if the benefits obtained from a better classification will exceed the costs. We model the problem of joint classification by the distributed and heterogeneous learners from multiple data sources as a distributed contextual bandit problem where each data is characterized by a specific context. We develop a distributed online learning algorithm for which we can prove sublinear regret. Compared to prior work in distributed online data mining, our work is the first to provide analytic regret results characterizing the performance of the proposed algorithm.
  • Keywords
    Big Data; data mining; learning (artificial intelligence); available classification functions; context information; distributed contextual bandit problem; distributed learners; distributed online big data classification framework; distributed online data mining; distributed online learning algorithm; high dimensional data; multiple data sources; multiple distributed data sources; online data mining systems; sublinear regret; Accuracy; Context; Data mining; Distributed databases; Nickel; Partitioning algorithms; Training;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Communication, Control, and Computing (Allerton), 2013 51st Annual Allerton Conference on
  • Conference_Location
    Monticello, IL
  • Print_ISBN
    978-1-4799-3409-6
  • Type

    conf

  • DOI
    10.1109/Allerton.2013.6736696
  • Filename
    6736696