DocumentCode :
424098
Title :
Combining word based and word co-occurrence based sequence analysis for text categorization
Author :
Luo, Xiao ; Zincir-Heywood, A. Nur
Author_Institution :
Fac. of Comput. Sci., Dalhousie Univ., Halifax, NS, Canada
Volume :
3
fYear :
2004
fDate :
26-29 Aug. 2004
Firstpage :
1580
Abstract :
This paper represents a text categorization system, which is based on the combination of a hierarchical SOMs encoding architecture and the designed kNN classifier. Through the encoding architecture, a document can be encoded to sequences of neurons so that the sequences of word/word co-occurrence as well as their frequencies are kept. A good performance (micro average F1-measure 0.98) is achieved on the experimental data set by using this system. This sequence analysis system for text categorization could automatically solve the high dimensionality problem for large data set. It could be utilized for other data categorization where sequences information is significant and important.
Keywords :
encoding; neural net architecture; pattern classification; self-organising feature maps; text analysis; document encoding; encoding architecture; kNN classifier; self organization map; text categorization; word cooccurrence based sequence analysis; Computer architecture; Computer science; Content management; Electronic mail; Encoding; Frequency; Information analysis; Machine learning; Neurons; Text categorization;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Machine Learning and Cybernetics, 2004. Proceedings of 2004 International Conference on
Print_ISBN :
0-7803-8403-2
Type :
conf
DOI :
10.1109/ICMLC.2004.1382026
Filename :
1382026
Link To Document :
بازگشت