Title :
Extension of similarity measures in VSM: From orthogonal coordinate system to affine coordinate system
Author :
Junyu Xuan ; Jie Lu ; Guangquan Zhang ; Xiangfeng Luo
Author_Institution :
Centre for Quantum Comput. & Intell. Syst. (QCIS), Univ. of Technol., Sydney, NSW, Australia
Abstract :
Similarity measures are the foundations of many research areas, e.g. information retrieval, recommender system and machine learning algorithms. Promoted by these application scenarios, a number of similarity measures have been proposed and proposing. In these state-of-the-art measures, vector-based representation is widely accepted based on Vector Space Model (VSM) in which an object is represented as a vector composed of its features. Then, the similarity between two objects is evaluated by the operations on two corresponding vectors, like cosine, extended jaccard, extended dice and so on. However, there is an assumption that the features are independent of each others. This assumption is apparently unrealistic, and normally, there are relations between features, i.e. the co-occurrence relations between keywords in text mining area. In this paper, a space geometry-based method is proposed to extend the VSM from the orthogonal coordinate system (OVSM) to affine coordinate system (AVSM) and OVSM is proved to be a special case of AVSM. Unit coordinate vectors of AVSM are inferred by the relations between features which are considered as angles between these unit coordinate vectors. At last, five different similarity measures are extended from OVSM to AVSM using unit coordinate vectors of AVSM. Within the numerous application fields of similarity measures, the task of text clustering is selected to be the evaluation criterion. Documents are represented as vectors in OVSM and AVSM, respectively. The clustering results show that AVSM outweighs the OVSM.
Keywords :
affine transforms; data mining; pattern clustering; vectors; AVSM; OVSM; affine coordinate system; evaluation criterion; information retrieval; machine learning algorithm; orthogonal coordinate system; recommender system; similarity measures; space geometry-based method; text clustering; text mining area; unit coordinate vectors; vector space model; vector-based representation; Coordinate measuring machines; Equations; Euclidean distance; Machine learning algorithms; Recommender systems; Vectors;
Conference_Titel :
Neural Networks (IJCNN), 2014 International Joint Conference on
Conference_Location :
Beijing
Print_ISBN :
978-1-4799-6627-1
DOI :
10.1109/IJCNN.2014.6889693