• DocumentCode
    2213406
  • Title

    Dimensionality reduction by random mapping: fast similarity computation for clustering

  • Author

    Kaski, Samuel

  • Author_Institution
    Neural Networks Res. Centre, Helsinki Univ. of Technol., Espoo, Finland
  • Volume
    1
  • fYear
    1998
  • fDate
    4-8 May 1998
  • Firstpage
    413
  • Abstract
    When the data vectors are high-dimensional it is computationally infeasible to use data analysis or pattern recognition algorithms which repeatedly compute similarities or distances in the original data space. It is therefore necessary to reduce the dimensionality before, for example, clustering the data. If the dimensionality is very high, like in the WEBSOM method which organizes textual document collections on a self-organizing map, then even the commonly used dimensionality reduction methods like the principal component analysis may be too costly. It is demonstrated that the document classification accuracy obtained after the dimensionality has been reduced using a random mapping method will be almost as good as the original accuracy if the final dimensionality is sufficiently large (about 100 out of 6000). In fact, it can be shown that the inner product (similarity) between the mapped vectors follows closely the inner product of the original vectors
  • Keywords
    data analysis; document image processing; pattern matching; self-organising feature maps; WEBSOM method; clustering; data analysis; data vectors; dimensionality reduction; document image processing; pattern recognition; random mapping method; self-organizing map; similarity computation; Clustering algorithms; Computer networks; Data analysis; Feature extraction; Multidimensional systems; Neural networks; Organizing; Pattern recognition; Space technology; Vectors;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Neural Networks Proceedings, 1998. IEEE World Congress on Computational Intelligence. The 1998 IEEE International Joint Conference on
  • Conference_Location
    Anchorage, AK
  • ISSN
    1098-7576
  • Print_ISBN
    0-7803-4859-1
  • Type

    conf

  • DOI
    10.1109/IJCNN.1998.682302
  • Filename
    682302