• DocumentCode
    424098
  • Title

    Combining word based and word co-occurrence based sequence analysis for text categorization

  • Author

    Luo, Xiao ; Zincir-Heywood, A. Nur

  • Author_Institution
    Fac. of Comput. Sci., Dalhousie Univ., Halifax, NS, Canada
  • Volume
    3
  • fYear
    2004
  • fDate
    26-29 Aug. 2004
  • Firstpage
    1580
  • Abstract
    This paper represents a text categorization system, which is based on the combination of a hierarchical SOMs encoding architecture and the designed kNN classifier. Through the encoding architecture, a document can be encoded to sequences of neurons so that the sequences of word/word co-occurrence as well as their frequencies are kept. A good performance (micro average F1-measure 0.98) is achieved on the experimental data set by using this system. This sequence analysis system for text categorization could automatically solve the high dimensionality problem for large data set. It could be utilized for other data categorization where sequences information is significant and important.
  • Keywords
    encoding; neural net architecture; pattern classification; self-organising feature maps; text analysis; document encoding; encoding architecture; kNN classifier; self organization map; text categorization; word cooccurrence based sequence analysis; Computer architecture; Computer science; Content management; Electronic mail; Encoding; Frequency; Information analysis; Machine learning; Neurons; Text categorization;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Machine Learning and Cybernetics, 2004. Proceedings of 2004 International Conference on
  • Print_ISBN
    0-7803-8403-2
  • Type

    conf

  • DOI
    10.1109/ICMLC.2004.1382026
  • Filename
    1382026