مرکز منطقه ای اطلاع رساني علوم و فناوري - A comparative analysis of data sets using Machine Learning techniques

DocumentCode :

120481

Title :

A comparative analysis of data sets using Machine Learning techniques

Author :

Abhilash, C.B. ; Rohitaksha, K. ; Biradar, Shankar

Author_Institution :

Comput. Sci. & Eng., JSS Acad. of Tech. Educ., Bangalore, India

fYear :

2014

fDate :

21-22 Feb. 2014

Firstpage :

Lastpage :

Abstract :

Machine Learning techniques are most widely used in the field of clustering of data. The K-means algorithm is one which is widely used algorithm for clustering of data sets and is easy to understand and simulate on different datasets. In our paper work we have used K-means algorithm for clustering of yeast dataset and iris datasets, in which clustering resulted in less accuracy with more number of iterations. We are simulating an improved version in K- means algorithm for clustering of these datasets, the Improved K-means algorithm use the technique of minimum spanning tree. An undirected graph is generated for all the input data points and then shortest distance is calculated which intern results in better accuracy and also with less number of iterations. Both algorithms have been simulated using java programming language; the results obtained from both algorithms are been compared and analyzed. Algorithms have been run for several times under different clustering groups and the analysis results showed that the Improved K- means algorithm has provided a better performance as compared to K-means algorithm; also Improved K-means algorithm showed that, as the number of cluster values increases the accuracy of the algorithm also increases. Also we have inferred from the results that at a particular value of K (cluster groups) the accuracy of Improved K-means algorithm is optimal.

Keywords :

Java; data analysis; learning (artificial intelligence); pattern clustering; trees (mathematics); Java programming language; clustering groups; data clustering; data sets comparative analysis; improved K-means algorithm; machine learning techniques; minimum spanning tree technique; undirected graph; yeast dataset; Accuracy; Algorithm design and analysis; Bioinformatics; Clustering algorithms; Genomics; Iris; Improved K-Means; K-Means; MST; Yeast dataset; iris dataset;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Advance Computing Conference (IACC), 2014 IEEE International

Conference_Location :

Gurgaon

Print_ISBN :

978-1-4799-2571-1

Type :

conf

DOI :

10.1109/IAdCC.2014.6779289

Filename :

6779289

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=120481