TY - GEN
T1 - Distributed Multigrid Neural Solvers on Megavoxel Domains
AU - Balu, Aditya
AU - Botelho, Sergio
AU - Khara, Biswajit
AU - Rao, Vinay
AU - Sarkar, Soumik
AU - Hegde, Chinmay
AU - Krishnamurthy, Adarsh
AU - Adavani, Santi
AU - Ganapathysubramanian, Baskar
N1 - Funding Information:
This work was partly supported by the ARPA-E DIFFERENTIATE under grant DE-AR0001215 and National Science Foundation under grants RII award number(s): 2019574, COALESCE award number(s): 1954556, CM award number(s): 1644441 and CAREER award number(s): 1750865. This work used the Extreme Science and Engineering Discovery Environment (XSEDE), which is supported by NSF grant ACI-1548562 and the Bridges2 system supported by NSF grant ACI-1445606, at the Pittsburgh Supercomputing Center (PSC). We also used Microsoft Azure compute resources for performing some of the GPU performance results shown.
Publisher Copyright:
© 2021 IEEE Computer Society. All rights reserved.
PY - 2021/11/14
Y1 - 2021/11/14
N2 - We consider the distributed training of large scale neural networks that serve as PDE (partial differential equation) solvers producing full field outputs. We specifically consider neural solvers for the generalized 3D Poisson equation over megavoxel domains. A scalable framework is presented that integrates two distinct advances. First, we accelerate training a large model via a method analogous to the multigrid technique used in numerical linear algebra. Here, the network is trained using a hierarchy of increasing resolution inputs in sequence, analogous to the V , W , F and Half-V cycles used in multigrid approaches. In conjunction with the multi-grid approach, we implement a distributed deep learning framework which significantly reduces the time to solve.We show scalability of this approach on both GPU (Azure VMs on Cloud) and CPU clusters (PSC Bridges2). This approach is deployed to train a generalized 3D Poisson solver that scales well to predict output full field solutions up to the resolution of 512×512×512 for a high dimensional family of inputs. This strategy opens up the possibility of fast and scalable training of neural PDE solvers on heterogeneous clusters.
AB - We consider the distributed training of large scale neural networks that serve as PDE (partial differential equation) solvers producing full field outputs. We specifically consider neural solvers for the generalized 3D Poisson equation over megavoxel domains. A scalable framework is presented that integrates two distinct advances. First, we accelerate training a large model via a method analogous to the multigrid technique used in numerical linear algebra. Here, the network is trained using a hierarchy of increasing resolution inputs in sequence, analogous to the V , W , F and Half-V cycles used in multigrid approaches. In conjunction with the multi-grid approach, we implement a distributed deep learning framework which significantly reduces the time to solve.We show scalability of this approach on both GPU (Azure VMs on Cloud) and CPU clusters (PSC Bridges2). This approach is deployed to train a generalized 3D Poisson solver that scales well to predict output full field solutions up to the resolution of 512×512×512 for a high dimensional family of inputs. This strategy opens up the possibility of fast and scalable training of neural PDE solvers on heterogeneous clusters.
KW - Distributed training
KW - Multigrid
KW - Neural PDE solvers
KW - Physics aware neural networks
UR - http://www.scopus.com/inward/record.url?scp=85119965441&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85119965441&partnerID=8YFLogxK
U2 - 10.1145/3458817.3476218
DO - 10.1145/3458817.3476218
M3 - Conference contribution
AN - SCOPUS:85119965441
T3 - International Conference for High Performance Computing, Networking, Storage and Analysis, SC
BT - Proceedings of SC 2021
PB - IEEE Computer Society
T2 - 33rd International Conference for High Performance Computing, Networking, Storage and Analysis: Science and Beyond, SC 2021
Y2 - 14 November 2021 through 19 November 2021
ER -