Title :
A Multi-dimensional Comparison of Toolkits for Machine Learning with Big Data
Author :
Aaron N. Richter;Taghi M. Khoshgoftaar;Sara Landset;Tawfiq Hasanin
Author_Institution :
Florida Atlantic Univ., Boca Raton, FL, USA
Abstract :
Big data is a big business, and effective modeling of this data is key. This paper provides a comprehensive multidimensional analysis of various open source tools for machine learning with big data. An evaluation standard is proposed along with detailed comparisons of the frameworks discussed, with regard to algorithm availability, scalability, speed, and more. The major tools profiled are Mahout, MLlib, H2O, and SAMOA, along with the big data processing engines they utilize, including Hadoop MapReduce, Apache Spark, and Apache Storm. There is not yet one framework that "does it all", but this paper provides insight into each tool´s strengths and weaknesses along with guidance on tool choice for specific needs.
Keywords :
"Sparks","Clustering algorithms","Water","Big data","Machine learning algorithms","Data models","Engines"
Conference_Titel :
Information Reuse and Integration (IRI), 2015 IEEE International Conference on
DOI :
10.1109/IRI.2015.12