• DocumentCode
    2207857
  • Title

    Multi-label Feature Selection for Graph Classification

  • Author

    Kong, Xiangnan ; Yu, Philip S.

  • Author_Institution
    Dept. of Comput. Sci., Univ. of Illinois at Chicago, Chicago, IL, USA
  • fYear
    2010
  • fDate
    13-17 Dec. 2010
  • Firstpage
    274
  • Lastpage
    283
  • Abstract
    Nowadays, the classification of graph data has become an important and active research topic in the last decade, which has a wide variety of real world applications, e.g. drug activity predictions and kinase inhibitor discovery. Current research on graph classification focuses on single-label settings. However, in many applications, each graph data can be assigned with a set of multiple labels simultaneously. Extracting good features using multiple labels of the graphs becomes an important step before graph classification. In this paper, we study the problem of multi-label feature selection for graph classification and propose a novel solution, called gMLC, to efficiently search for optimal sub graph features for graph objects with multiple labels. Different from existing feature selection methods in vector spaces which assume the feature set is given, we perform multi-label feature selection for graph data in a progressive way together with the sub graph feature mining process. We derive an evaluation criterion, named gHSIC, to estimate the dependence between sub graph features and multiple labels of graphs. Then a branch-and-bound algorithm is proposed to efficiently search for optimal sub graph features by judiciously pruning the sub graph search space using multiple labels. Empirical studies on real-world tasks demonstrate that our feature selection approach can effectively boost multi-label graph classification performances and is more efficient by pruning the sub graph search space using multiple labels.
  • Keywords
    data mining; feature extraction; graph theory; pattern classification; tree searching; branch-and-bound algorithm; feature extraction; feature mining; graph classification; graph object; multilabel feature selection; search space; vector space; feature selection; graph classification; multi-label learning;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Mining (ICDM), 2010 IEEE 10th International Conference on
  • Conference_Location
    Sydney, NSW
  • ISSN
    1550-4786
  • Print_ISBN
    978-1-4244-9131-5
  • Electronic_ISBN
    1550-4786
  • Type

    conf

  • DOI
    10.1109/ICDM.2010.58
  • Filename
    5693981