DocumentCode
424098
Title
Combining word based and word co-occurrence based sequence analysis for text categorization
Author
Luo, Xiao ; Zincir-Heywood, A. Nur
Author_Institution
Fac. of Comput. Sci., Dalhousie Univ., Halifax, NS, Canada
Volume
3
fYear
2004
fDate
26-29 Aug. 2004
Firstpage
1580
Abstract
This paper represents a text categorization system, which is based on the combination of a hierarchical SOMs encoding architecture and the designed kNN classifier. Through the encoding architecture, a document can be encoded to sequences of neurons so that the sequences of word/word co-occurrence as well as their frequencies are kept. A good performance (micro average F1-measure 0.98) is achieved on the experimental data set by using this system. This sequence analysis system for text categorization could automatically solve the high dimensionality problem for large data set. It could be utilized for other data categorization where sequences information is significant and important.
Keywords
encoding; neural net architecture; pattern classification; self-organising feature maps; text analysis; document encoding; encoding architecture; kNN classifier; self organization map; text categorization; word cooccurrence based sequence analysis; Computer architecture; Computer science; Content management; Electronic mail; Encoding; Frequency; Information analysis; Machine learning; Neurons; Text categorization;
fLanguage
English
Publisher
ieee
Conference_Titel
Machine Learning and Cybernetics, 2004. Proceedings of 2004 International Conference on
Print_ISBN
0-7803-8403-2
Type
conf
DOI
10.1109/ICMLC.2004.1382026
Filename
1382026
Link To Document