• DocumentCode
    3487460
  • Title

    A Stream-Based Semi-supervised Active Learning Approach for Document Classification

  • Author

    Bouguelia, Mohamed-Rafik ; Belaid, Yolande ; Belaid, Abdel

  • Author_Institution
    Univ. de Lorraine - LORIA, Vandoeuvre-les-Nancy, France
  • fYear
    2013
  • fDate
    25-28 Aug. 2013
  • Firstpage
    611
  • Lastpage
    615
  • Abstract
    We consider an industrial context where we deal with a stream of unlabelled documents that become available progressively over time. Based on an adaptive incremental neural gas algorithm (AING), we propose a new stream-based semi supervised active learning method (A2ING) for document classification, which is able to actively query (from a human annotator) the class-labels of documents that are most informative for learning, according to an uncertainty measure. The method maintains a model as a dynamically evolving graph topology of labelled document-representatives that we call neurons. Experiments on different real datasets show that the proposed method requires on average only 36.3% of the incoming documents to be labelled, in order to learn a model which achieves an average gain of 2.15-3.22% in precision, compared to the traditional supervised learning with fully labelled training documents.
  • Keywords
    graph theory; learning (artificial intelligence); pattern classification; query processing; text analysis; A2ING; AING algorithm; adaptive incremental neural gas algorithm; document class-label querying; document classification; dynamically evolving graph topology; labelled document-representatives; stream-based semisupervised active learning method; uncertainty measure; unlabelled documents; Labeling; Measurement uncertainty; Neurons; Testing; Topology; Training; Uncertainty; Active learning; Document classification; Incremental learning; data stream; semi-supervised learning;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Document Analysis and Recognition (ICDAR), 2013 12th International Conference on
  • Conference_Location
    Washington, DC
  • ISSN
    1520-5363
  • Type

    conf

  • DOI
    10.1109/ICDAR.2013.126
  • Filename
    6628691