• DocumentCode
    592698
  • Title

    Automatic classification of academic documents using text mining techniques

  • Author

    Nunez, Haydemar ; Ramos, Edgar

  • Author_Institution
    Lab. de Intel. Artificial. Escuela de Comput., Univ. Central de Venezuela, Caracas, Venezuela
  • fYear
    2012
  • fDate
    1-5 Oct. 2012
  • Firstpage
    1
  • Lastpage
    7
  • Abstract
    In this work an automatic classifier of undergraduate final projects based on text mining is presented. The dataset, comprising documents from four professional categories, was represented by means the vector space model with different index metrics. Also, a number of techniques for reduction dimensionality were applied over the word space. In order to construct the classification model the K-nearest neighbor algorithm was applied. Using 10-fold cross-validations we could obtain 82% of predictive accuracy. However, we achieved an accuracy of 95% with a recommendation of up to two categories taking into account the interdisciplinary in documents. This classifier was integrated into an application for automatic assignment of reviewers, which performs this assignation from teachers who belong to the areas recommended.
  • Keywords
    data mining; data reduction; educational administrative data processing; further education; pattern classification; text analysis; 10-fold cross-validations; K-nearest neighbor algorithm; automatic academic document classification; automatic classifier; automatic reviewer assignment; dimensionality reduction techniques; index metrics; predictive accuracy; professional categories; text mining techniques; undergraduate final projects; vector space model; word space; Accuracy; Chebyshev approximation; Classification algorithms; Computational modeling; Laboratories; Text mining; Vectors; K nearest neighbor algorithm; Text mining; classification models; documents categorization;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Informatica (CLEI), 2012 XXXVIII Conferencia Latinoamericana En
  • Conference_Location
    Medellin
  • Print_ISBN
    978-1-4673-0794-9
  • Type

    conf

  • DOI
    10.1109/CLEI.2012.6427167
  • Filename
    6427167