• DocumentCode
    2633967
  • Title

    Text categorization study case: Patents´ application documents

  • Author

    de Oliveira Gomes, Neide ; Passos, Emmanuel Piceses Lopes

  • Author_Institution
    Electr. Eng. Dept., Pontifical Catholic Univ., Rio de Janeiro, Brazil
  • fYear
    2011
  • fDate
    21-23 June 2011
  • Firstpage
    446
  • Lastpage
    450
  • Abstract
    This paper presents computational methods aiming to patent´s text categorization in Portuguese language, involving techniques from machine learning and computational linguistics. The algorithm used was the k-Nearest Neighbor method (k-NN) modified which showed good results, although it requires much computational time in the training stage. For the pre-processing step, it was implemented, with modifications, the stemming method called StemmerPortuguese that includes the removal of suffixes, besides the removal of stopwords and treatment of compound terms.
  • Keywords
    natural language processing; text analysis; Portuguese language; StemmerPortuguese; computational linguistics; computational time; k-NN; k-Nearest Neighbor method; machine learning; patents application documents; stemming method; text categorization; Classification algorithms; Databases; Equations; Informatics; Patents; Text categorization; Training; Categorization of Patents´ Applications; Classification of Patent´s Applications; Knowledge Discovery in Texts; Text Categorization; Text Classification;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Industrial Electronics and Applications (ICIEA), 2011 6th IEEE Conference on
  • Conference_Location
    Beijing
  • ISSN
    pending
  • Print_ISBN
    978-1-4244-8754-7
  • Electronic_ISBN
    pending
  • Type

    conf

  • DOI
    10.1109/ICIEA.2011.5975625
  • Filename
    5975625