• DocumentCode
    633071
  • Title

    Countering the Concept-Drift Problem in Big Data Using iOVFDT

  • Author

    Hang Yang ; Fong, Simon

  • Author_Institution
    Dept. of Comput. & Inf. Sci., Univ. of Macau, Macau, China
  • fYear
    2013
  • fDate
    June 27 2013-July 2 2013
  • Firstpage
    126
  • Lastpage
    132
  • Abstract
    How to efficiently uncover the knowledge hidden within massive and big data remains an open problem. One of the challenges is the issue of ´concept drift´ in streaming data flows. Concept drift is a well-known problem in data analytics, in which the statistical properties of the attributes and their target classes shift over time, making the trained model less accurate. Many methods have been proposed for data mining in batch mode. Stream mining represents a new generation of data mining techniques, in which the model is updated in one pass whenever new data arrive. This one-pass mechanism is inherently adaptive and hence potentially more robust than its predecessors in handling concept drift in data streams. In this paper, we evaluate the performance of a family of decision-tree-based data stream mining algorithms. The advantage of incremental decision tree learning is the set of rules that can be extracted from the induced model. The extracted rules, in the form of predicate logics, can be used subsequently in many decision-support applications. However, the induced decision tree must be both accurate and compact, even in the presence of concept drift. We compare the performance of three typical incremental decision tree algorithms (VFDT [2], ADWIN [3], iOVFDT [4]) in dealing with concept-drift data. Both synthetic and real-world drift data are used in the experiment. iOVFDT is found to produce superior results.
  • Keywords
    data analysis; data mining; decision support systems; decision trees; learning (artificial intelligence); statistical analysis; ADWIN; batch mode; big data; concept-drift problem; data analytics; decision-support applications; decision-tree-based data stream mining algorithms; iOVFDT; incremental decision tree learning; one-pass mechanism; predicate logics; statistical properties; stream mining; streaming data flows; Accuracy; Adaptation models; Data handling; Data mining; Decision trees; Information management; Vegetation; classification; concept drift; data stream mining; incremental decision tree;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Big Data (BigData Congress), 2013 IEEE International Congress on
  • Conference_Location
    Santa Clara, CA
  • Print_ISBN
    978-0-7695-5006-0
  • Type

    conf

  • DOI
    10.1109/BigData.Congress.2013.25
  • Filename
    6597128