TY - JOUR
T1 - A massively parallel adaptive fast multipole method on heterogeneous architectures
AU - Lashuk, Ilya
AU - Chandramowlishwaran, Aparna
AU - Langston, Harper
AU - Nguyen, Tuan Anh
AU - Sampath, Rahul
AU - Shringarpure, Aashay
AU - Vuduc, Richard
AU - Ying, Lexing
AU - Zorin, Denis
AU - Biros, George
PY - 2012/5
Y1 - 2012/5
N2 - We describe a parallel fast multipole method (FMM) for highly nonuniform distributions of particles. We employ both distributed memory parallelism (via MPI) and shared memory parallelism (via OpenMP and GPU acceleration) to rapidly evaluate two-body nonoscillatory potentials in three dimensions on heterogeneous high performance computing architectures. We have performed scalability tests with up to 30 billion particles on 196,608 cores on the AMD/ CRAY-based Jaguar system at ORNL. On a GPU-enabled system (NSF's Keeneland at Georgia Tech/ORNL), we observed 30× speedup over a single core CPU and 7× speedup over a multicore CPU implementation. By combining GPUs with MPI, we achieve less than 10 ns/particle and six digits of accuracy for a run with 48 million nonuniformly distributed particles on 192 GPUs.
AB - We describe a parallel fast multipole method (FMM) for highly nonuniform distributions of particles. We employ both distributed memory parallelism (via MPI) and shared memory parallelism (via OpenMP and GPU acceleration) to rapidly evaluate two-body nonoscillatory potentials in three dimensions on heterogeneous high performance computing architectures. We have performed scalability tests with up to 30 billion particles on 196,608 cores on the AMD/ CRAY-based Jaguar system at ORNL. On a GPU-enabled system (NSF's Keeneland at Georgia Tech/ORNL), we observed 30× speedup over a single core CPU and 7× speedup over a multicore CPU implementation. By combining GPUs with MPI, we achieve less than 10 ns/particle and six digits of accuracy for a run with 48 million nonuniformly distributed particles on 192 GPUs.
UR - http://www.scopus.com/inward/record.url?scp=84860239558&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84860239558&partnerID=8YFLogxK
U2 - 10.1145/2160718.2160740
DO - 10.1145/2160718.2160740
M3 - Article
AN - SCOPUS:84860239558
SN - 0001-0782
VL - 55
SP - 101
EP - 109
JO - Communications of the ACM
JF - Communications of the ACM
IS - 5
ER -