• DocumentCode
    3166114
  • Title

    Document Transformation for Multi-label Feature Selection in Text Categorization

  • Author

    Chen, Weizhu ; Yan, Jun ; Zhang, Benyu ; Chen, Zheng ; Yang, Qiang

  • Author_Institution
    Microsoft Res. Asia, Beijing
  • fYear
    2007
  • fDate
    28-31 Oct. 2007
  • Firstpage
    451
  • Lastpage
    456
  • Abstract
    Feature selection on multi-label documents for automatic text categorization is an under-explored research area. This paper presents a systematic document transformation framework, whereby the multi-label documents are transformed into single-label documents before applying standard feature selection algorithms, to solve the multi-label feature selection problem. Under this framework, we undertake a comparative study on four intuitive document transformation approaches and propose a novel approach called entropy-based label assignment (ELA), which assigns the labels weights to a multi-label document based on label entropy. Three standard feature selection algorithms are utilized for evaluating the document transformation approaches in order to verify its impact on multi-class text categorization problems. Using a SVM classifier and two multi-label evaluation benchmark text collections, we show that the choice of document transformation approaches can significantly influence the performance of multi-class categorization and that our proposed document transformation approach ELA can achieve better performance than all other approaches.
  • Keywords
    support vector machines; text analysis; SVM classifier; document transformation evaluation; entropy-based label assignment; intuitive document transformation; label entropy; multiclass categorization; multiclass text categorization; multilabel document transformation; multilabel evaluation benchmark text collection; multilabel feature selection; single-label document; support vector machine; Algorithm design and analysis; Asia; Computer science; Data mining; Entropy; Explosives; Support vector machine classification; Support vector machines; Text categorization; Web sites;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Mining, 2007. ICDM 2007. Seventh IEEE International Conference on
  • Conference_Location
    Omaha, NE
  • ISSN
    1550-4786
  • Print_ISBN
    978-0-7695-3018-5
  • Type

    conf

  • DOI
    10.1109/ICDM.2007.18
  • Filename
    4470272