DocumentCode
2633967
Title
Text categorization study case: Patents´ application documents
Author
de Oliveira Gomes, Neide ; Passos, Emmanuel Piceses Lopes
Author_Institution
Electr. Eng. Dept., Pontifical Catholic Univ., Rio de Janeiro, Brazil
fYear
2011
fDate
21-23 June 2011
Firstpage
446
Lastpage
450
Abstract
This paper presents computational methods aiming to patent´s text categorization in Portuguese language, involving techniques from machine learning and computational linguistics. The algorithm used was the k-Nearest Neighbor method (k-NN) modified which showed good results, although it requires much computational time in the training stage. For the pre-processing step, it was implemented, with modifications, the stemming method called StemmerPortuguese that includes the removal of suffixes, besides the removal of stopwords and treatment of compound terms.
Keywords
natural language processing; text analysis; Portuguese language; StemmerPortuguese; computational linguistics; computational time; k-NN; k-Nearest Neighbor method; machine learning; patents application documents; stemming method; text categorization; Classification algorithms; Databases; Equations; Informatics; Patents; Text categorization; Training; Categorization of Patents´ Applications; Classification of Patent´s Applications; Knowledge Discovery in Texts; Text Categorization; Text Classification;
fLanguage
English
Publisher
ieee
Conference_Titel
Industrial Electronics and Applications (ICIEA), 2011 6th IEEE Conference on
Conference_Location
Beijing
ISSN
pending
Print_ISBN
978-1-4244-8754-7
Electronic_ISBN
pending
Type
conf
DOI
10.1109/ICIEA.2011.5975625
Filename
5975625
Link To Document