Efficient parallelization of batch pattern training algorithm on many-core and cluster architectures

Author

Turchenko, Volodymyr ; Bosilca, George ; Bouteiller, Aurelien ; Dongarra, Jack

Author_Institution

Innovative Comput. Lab., Univ. of Tennessee, Knoxville, TN, USA

Volume

02

fYear

2013

fDate

12-14 Sept. 2013

Firstpage

692

Lastpage

698

Abstract

The experimental research of the parallel batch pattern back propagation training algorithm on the example of recirculation neural network on many-core high performance computing systems is presented in this paper. The choice of recirculation neural network among the multilayer perceptron, recurrent and radial basis neural networks is proved. The model of a recirculation neural network and usual sequential batch pattern algorithm of its training are theoretically described. An algorithmic description of the parallel version of the batch pattern training method is presented. The experimental research is fulfilled using the Open MPI, Mvapich and Intel MPI message passing libraries. The results obtained on many-core AMD system and Intel MIC are compared with the results obtained on a cluster system. Our results show that the parallelization efficiency is about 95% on 12 cores located inside one physical AMD processor for the considered minimum and maximum scenarios. The parallelization efficiency is about 70-75% on 48 AMD cores for the minimum and maximum scenarios. These results are higher by 15-36% (depending on the version of MPI library) in comparison with the results obtained on 48 cores of a cluster system. The parallelization efficiency obtained on Intel MIC architecture is surprisingly low, asking for deeper analysis.

Keywords

application program interfaces; batch processing (computers); learning (artificial intelligence); message passing; multilayer perceptrons; multiprocessing systems; parallel architectures; workstation clusters; AMD cores; Intel MIC architecture; Intel MPI message passing libraries; MPI library; Mvapich; Open MPI; batch pattern training algorithm; batch pattern training method; cluster architectures; cluster system; many-core AMD system; many-core architectures; many-core high performance computing systems; multilayer perceptron; parallel batch pattern back propagation training algorithm; parallel version; parallelization; physical AMD processor; radial basis neural networks; recirculation neural network; sequential batch pattern algorithm; Algorithm design and analysis; Artificial neural networks; Clustering algorithms; Lips; Microwave integrated circuits; Neurons; Training; many-core system; parallel batch pattern training; parallelization efficiency; recirculation neural network;

fLanguage

English

Publisher

ieee

Conference_Titel

Intelligent Data Acquisition and Advanced Computing Systems (IDAACS), 2013 IEEE 7th International Conference on

Conference_Location

Berlin

Print_ISBN

978-1-4799-1426-5

Type

conf

DOI

10.1109/IDAACS.2013.6663014

Filename

6663014