• DocumentCode
    1906464
  • Title

    Discovering Highly Informative Feature Set over High Dimensions

  • Author

    Chongsheng Zhang ; Masseglia, Florent ; Xiangliang Zhang

  • Author_Institution
    Henan Univ., Kaifeng, China
  • Volume
    1
  • fYear
    2012
  • fDate
    7-9 Nov. 2012
  • Firstpage
    1059
  • Lastpage
    1064
  • Abstract
    For many textual collections, the number of features is often overly large. These features can be very redundant, it is therefore desirable to have a small, succinct, yet highly informative collection of features that describes the key characteristics of a dataset. Information theory is one such tool for us to obtain this feature collection. With this paper, we mainly contribute to the improvement of efficiency for the process of selecting the most informative feature set over high-dimensional unlabeled data. We propose a heuristic theory for informative feature set selection from high dimensional data. Moreover, we design data structures that enable us to compute the entropies of the candidate feature sets efficiently. We also develop a simple pruning strategy that eliminates the hopeless candidates at each forward selection step. We test our method through experiments on real-world data sets, showing that our proposal is very efficient.
  • Keywords
    data structures; entropy; feature extraction; text analysis; data structure design; dataset; entropy; feature collection; heuristic theory; high dimensional data; high-dimensional unlabeled data; information theory; informative collection; informative feature set selection; pruning strategy; real-world data set; textual collection; Algorithm design and analysis; Data structures; Entropy; Feature extraction; Joints; Telecommunications; Upper bound; Feature Selection; Unsupervised; high dimensions;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Tools with Artificial Intelligence (ICTAI), 2012 IEEE 24th International Conference on
  • Conference_Location
    Athens
  • ISSN
    1082-3409
  • Print_ISBN
    978-1-4799-0227-9
  • Type

    conf

  • DOI
    10.1109/ICTAI.2012.149
  • Filename
    6495166