Title :
Optimizing protocol parameters to large scale PC cluster and evaluation of its effectiveness with parallel data mining
Author :
Oguchi, Masato ; Shintani, Takahiko ; Tamura, Takayuki ; Kitsuregawa, Masaru
Author_Institution :
Inst. of Ind. Sci., Tokyo Univ., Japan
Abstract :
PC clusters have been studied intensively for next-generation large scale parallel computers. ATM technology is a strong candidate as a de facto standard of high speed communication networks. Therefore an ATM connected PC cluster is a very promising platform from the cost/performance point of view, as a future high performance computing environment. An ATM connected PC cluster consisting of 100 PCs is reported, and characteristics of a transport layer protocol for the PC cluster are evaluated. Point-to-point communication performance is measured and discussed when a TCP window size parameter is changed. Retransmission caused by cell loss at the ATM switch is analyzed, and parameters of the retransmission mechanism suitable for parallel processing on the large scale PC cluster are clarified. From the viewpoint of applications, data intensive applications such as data mining and ad-hoc query processing in databases are considered to be very important for massively parallel processors, in addition to conventional scientific calculations. Thus, investigating the feasibility of such applications on an ATM connected PC cluster is quite meaningful. Parallel data mining is implemented and evaluated on the cluster. The default TCP protocol cannot provide good performance, since a lot of collisions happen during all-to-all multicasting executed on the large scale PC cluster. Using TCP parameters according to the proposed optimization, sufficient performance improvement is achieved for parallel data mining on 100 PCs
Keywords :
asynchronous transfer mode; knowledge acquisition; local area networks; parallel machines; performance evaluation; query processing; transport protocols; ATM connected PC cluster; ATM switch; TCP window size parameter; ad-hoc query processing; all-to-all multicasting; cell loss; data intensive applications; databases; high speed communication networks; large scale PC cluster; large scale parallel computers; parallel data mining; point-to-point communication performance; protocol parameter optimisation; retransmission; scientific calculation; transport layer protocol; Asynchronous transfer mode; Communication networks; Communication standards; Concurrent computing; Costs; Data mining; Large-scale systems; Personal communication networks; Protocols; Switches;
Conference_Titel :
High Performance Distributed Computing, 1998. Proceedings. The Seventh International Symposium on
Conference_Location :
Chicago, IL
Print_ISBN :
0-8186-8579-4
DOI :
10.1109/HPDC.1998.709950