A massively parallel adaptive fast multipole method on heterogeneous architectures

Ilya Lashuk, Aparna Chandramowlishwaran, Harper Langston, Tuan Anh Nguyen, Rahul Sampath, Aashay Shringarpure, Richard Vuduc, Lexing Ying, Denis Zorin, George Biros

Research output: Contribution to journalArticlepeer-review

Abstract

We describe a parallel fast multipole method (FMM) for highly nonuniform distributions of particles. We employ both distributed memory parallelism (via MPI) and shared memory parallelism (via OpenMP and GPU acceleration) to rapidly evaluate two-body nonoscillatory potentials in three dimensions on heterogeneous high performance computing architectures. We have performed scalability tests with up to 30 billion particles on 196,608 cores on the AMD/ CRAY-based Jaguar system at ORNL. On a GPU-enabled system (NSF's Keeneland at Georgia Tech/ORNL), we observed 30× speedup over a single core CPU and 7× speedup over a multicore CPU implementation. By combining GPUs with MPI, we achieve less than 10 ns/particle and six digits of accuracy for a run with 48 million nonuniformly distributed particles on 192 GPUs.

Original languageEnglish (US)
Pages (from-to)101-109
Number of pages9
JournalCommunications of the ACM
Volume55
Issue number5
DOIs
StatePublished - May 2012

ASJC Scopus subject areas

  • Computer Science(all)

Fingerprint Dive into the research topics of 'A massively parallel adaptive fast multipole method on heterogeneous architectures'. Together they form a unique fingerprint.

Cite this