• DocumentCode
    178627
  • Title

    Privileged Information for Hierarchical Document Clustering: A Metric Learning Approach

  • Author

    Marcondes Marcacini, R. ; Domingues, M.A. ; Hruschka, E.R. ; Oliveira Rezende, S.

  • Author_Institution
    Fed. Univ. of Mato Grosso do Sul (UFMS), Tres Lagoas, Brazil
  • fYear
    2014
  • fDate
    24-28 Aug. 2014
  • Firstpage
    3636
  • Lastpage
    3641
  • Abstract
    Traditional hierarchical text clustering methods assume that the documents are represented only by "technical information", i.e., keywords, phrases, expressions and named entities that can be directly extracted from the texts. However, in many scenarios there is an additional and valuable information about the documents which is usually disregarded during the clustering task, such as user-validated tags, annotations and comments from experts, dictionaries and domain ontologies. Recently, Vapnik introduced a new learning paradigm, called LUPI - Learning Using Privileged Information, which allows the incorporation of this additional (privileged) information in a supervised learning setting. We investigated the incorporation of privileged information in unsupervised setting. The key idea in our proposed approach is to extract important relationships among documents represented in the privileged information dimensional space to learn a more accurate metric for text clustering in the technical information space. A thorough experimental evaluation indicates that the incorporation of privileged information through metric learning significantly improves the hierarchical clustering accuracy.
  • Keywords
    learning (artificial intelligence); ontologies (artificial intelligence); pattern clustering; text analysis; LUPI; dictionaries ontologies; domain ontologies; hierarchical document clustering; hierarchical text clustering methods; learning using privileged information; metric learning approach; privileged information; supervised learning setting; technical information; text clustering; Accuracy; Clustering algorithms; Clustering methods; Data mining; Feature extraction; Measurement; Partitioning algorithms;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Pattern Recognition (ICPR), 2014 22nd International Conference on
  • Conference_Location
    Stockholm
  • ISSN
    1051-4651
  • Type

    conf

  • DOI
    10.1109/ICPR.2014.625
  • Filename
    6977337