TY - JOUR
T1 - Performance of a supercomputer built with commodity components
AU - Deng, Yuefan
AU - Korobka, Alex
N1 - Funding Information:
We wish to thank Dr. J. Glimm, Dr. F. Tangerman, Mr. E. Roman, and Mr. G. Smith for their help in various stages of the Galaxy project. Financial support from the New York State, ARO, and NSF is greatly appreciated.
PY - 2001/1
Y1 - 2001/1
N2 - We built a supercomputer called Galaxy by connecting Intel Pentium-based computer nodes with Fast and Gigabit Ethernet switches. Each node has two processors at clock speeds varying from 300 to 600 MHz, up to 512 MB of memory, and small 2 Gb local disk. All nodes run the standard RedHat Linux and inter-node communication is handled by a message passing interface called MPI. Local tools are written to visualize the system performance and to balance loads. We have benchmarked a sub-Galaxy with 72 processors by NAS and Parallel LINPACK benchmark suites. We achieved 16.9 Gflops in a standard single precision LU decomposition for 46848×46848 matrix parallel LINPACK benchmark. A Galaxy with 128 processors costs approximately $250 000 and it delivers 40 Gflops of performance. This leads to a cost-performance ratio of 160 Kflops-per-dollar, which is to improve further due to increase in processor speeds and network bandwidth at similar cost. Our final system with 512 processors is expected to reach several Tflops. This article first describes the Galaxy architectural details, and then present and analyze its performance in terms of floating point number crunching, network bandwidth, and IO throughput.
AB - We built a supercomputer called Galaxy by connecting Intel Pentium-based computer nodes with Fast and Gigabit Ethernet switches. Each node has two processors at clock speeds varying from 300 to 600 MHz, up to 512 MB of memory, and small 2 Gb local disk. All nodes run the standard RedHat Linux and inter-node communication is handled by a message passing interface called MPI. Local tools are written to visualize the system performance and to balance loads. We have benchmarked a sub-Galaxy with 72 processors by NAS and Parallel LINPACK benchmark suites. We achieved 16.9 Gflops in a standard single precision LU decomposition for 46848×46848 matrix parallel LINPACK benchmark. A Galaxy with 128 processors costs approximately $250 000 and it delivers 40 Gflops of performance. This leads to a cost-performance ratio of 160 Kflops-per-dollar, which is to improve further due to increase in processor speeds and network bandwidth at similar cost. Our final system with 512 processors is expected to reach several Tflops. This article first describes the Galaxy architectural details, and then present and analyze its performance in terms of floating point number crunching, network bandwidth, and IO throughput.
UR - http://www.scopus.com/inward/record.url?scp=0035056683&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=0035056683&partnerID=8YFLogxK
U2 - 10.1016/S0167-8191(00)00090-9
DO - 10.1016/S0167-8191(00)00090-9
M3 - Article
AN - SCOPUS:0035056683
SN - 0167-8191
VL - 27
SP - 91
EP - 108
JO - Parallel Computing
JF - Parallel Computing
IS - 1-2
ER -