• DocumentCode
    3059940
  • Title

    Dimensionality reduction for active learning with nearest neighbour classifier in text categorisation problems

  • Author

    Davy, Michael ; Luz, Saturnino

  • Author_Institution
    Trinity Coll. Dublin, Dublin
  • fYear
    2007
  • fDate
    13-15 Dec. 2007
  • Firstpage
    292
  • Lastpage
    297
  • Abstract
    Dimensionality reduction techniques are commonly used in text categorisation problems to improve training and classification efficiency as well as to avoid overfitting. The best performing dimensionality reduction techniques for text categorisation are supervised, hence utilise the label information of the training data. Active learning is used to reduce the number of labelled training examples for problems where obtaining label information is expensive. Since the vast majority of data supplied to active learning are unlabelled, supervised dimensionality reduction techniques cannot be readily employed. For this reason, active learning in text categorisation problems do not perform dimensionality reduction thereby restricting the choice of classifier. In this paper we investigate unsupervised dimensionality reduction techniques in active learning for text categorisation problems. Two unsupervised techniques are investigated, namely document frequency and principal components analysis. We empirically show increased performance of active learning, using a k-nearest neighbour classifier, when dimensionality reduction is applied using the unsupervised techniques.
  • Keywords
    data reduction; pattern classification; principal component analysis; text analysis; active learning; document frequency; k-nearest neighbour classifier; label information; principal components analysis; text categorisation problem; unsupervised dimensionality reduction; Application software; Artificial intelligence; Computer science; Educational institutions; Frequency; Machine learning; Principal component analysis; Supervised learning; Text categorization; Training data;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Machine Learning and Applications, 2007. ICMLA 2007. Sixth International Conference on
  • Conference_Location
    Cincinnati, OH
  • Print_ISBN
    978-0-7695-3069-7
  • Type

    conf

  • DOI
    10.1109/ICMLA.2007.9
  • Filename
    4457246