DocumentCode :
3658496
Title :
Unified Programming Model and Software Framework for Big Data Machine Learning and Data Analytics
Author :
Rong Gu;Yun Tang;Qianhao Dong;Zhaokang Wang;Zhiqiang Liu;Shuai Wang;Chunfeng Yuan;Yihua Huang
Author_Institution :
Collaborative Innovation Center of Novel Software Technol. &
Volume :
3
fYear :
2015
fDate :
7/1/2015 12:00:00 AM
Firstpage :
562
Lastpage :
567
Abstract :
In a new era of Big Data, the rapid growth of the applications, such as social media and web-search, requires efficient and scalable machine learning and statistical analytical algorithms. However, there lacks easy-to-use and efficient software frameworks or systems that can support fast development of such big data analytical algorithms. To solve these problems, we propose Octopus, an easy-to-use and efficient analytical system for big data. Octopus allows data analysts conduct complex data analytics for big data with traditional programming languages and methods in an easy and efficient way. To achieve the goal of ease-to-use, we propose a matrix-based unified programming model, which is the core of many data-intensive statistical applications such as numerical analysis and data mining. Further, in order to improve the performance, the Octopus software framework adopts various distributed computing platforms, including Hadoop MapReduce, Spark and MPI. On these computing platforms, we design several parallel matrix computation algorithms, which are suitable for various scenarios. Finally, the features of Octopus are encapsulated into a library with matrix-based APIs and exposed to users as an R package. R is a widely-used statistical programming language and supports diversified data analysis tasks through extension packages. Experimental results show that Octopus achieves efficient performance and near linear scalability.
Keywords :
"Sparks","Scalability","Big data","Libraries","Programming","Machine learning algorithms","Distributed databases"
Publisher :
ieee
Conference_Titel :
Computer Software and Applications Conference (COMPSAC), 2015 IEEE 39th Annual
Electronic_ISBN :
0730-3157
Type :
conf
DOI :
10.1109/COMPSAC.2015.275
Filename :
7273424
Link To Document :
بازگشت