TY - GEN
T1 - Large-scale algorithm design for parallel FFT-based simulations on GPUs
AU - Kulkarni, Anuva
AU - Franchetti, Franz
AU - Kovacevic, Jelena
N1 - Funding Information:
The authors would like to thank Dr. Anthony Rollett, Dr. Vahid Tari, Dr. Anirban Jana and Dr. Roberto Gomez for their assistance and collaboration. This work used the Extreme Science and Engineering Discovery Environment (XSEDE), which is supported by National Science Foundation grant number ACI-1548562.
Publisher Copyright:
© 2018 IEEE.
PY - 2018/7/2
Y1 - 2018/7/2
N2 - We describe and analyze a co-design of algorithm and software for high-performance simulation of a partial differential equation (PDE) numerical solver for large-scale datasets. Large-scale scientific simulations involving parallel Fast Fourier Transforms (FFTs) have extreme memory requirements and high communication cost. This hampers high resolution analysis with fine grids. Moreover, it is difficult to accelerate legacy Fortran scientific codes with modern hardware such as GPUs because of memory constraints of GPUs. Our proposed solution uses signal processing techniques such as lossy compression and domain-local FFTs to lower iteration cost without adversely impacting accuracy of the result. In this work, we discuss proof-of-concept results for various aspects of algorithm development.
AB - We describe and analyze a co-design of algorithm and software for high-performance simulation of a partial differential equation (PDE) numerical solver for large-scale datasets. Large-scale scientific simulations involving parallel Fast Fourier Transforms (FFTs) have extreme memory requirements and high communication cost. This hampers high resolution analysis with fine grids. Moreover, it is difficult to accelerate legacy Fortran scientific codes with modern hardware such as GPUs because of memory constraints of GPUs. Our proposed solution uses signal processing techniques such as lossy compression and domain-local FFTs to lower iteration cost without adversely impacting accuracy of the result. In this work, we discuss proof-of-concept results for various aspects of algorithm development.
KW - Algorithm design
KW - GPU
KW - Irregular domain decomposition
KW - Lossy compression
UR - http://www.scopus.com/inward/record.url?scp=85063099856&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85063099856&partnerID=8YFLogxK
U2 - 10.1109/GlobalSIP.2018.8646675
DO - 10.1109/GlobalSIP.2018.8646675
M3 - Conference contribution
AN - SCOPUS:85063099856
T3 - 2018 IEEE Global Conference on Signal and Information Processing, GlobalSIP 2018 - Proceedings
SP - 301
EP - 305
BT - 2018 IEEE Global Conference on Signal and Information Processing, GlobalSIP 2018 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2018 IEEE Global Conference on Signal and Information Processing, GlobalSIP 2018
Y2 - 26 November 2018 through 29 November 2018
ER -