Deep generative models that solve pdes: Distributed computing for training large data-free models

Sergio Botelho, Ameya Joshi, Biswajit Khara, Vinay Rao, Soumik Sarkar, Chinmay Hegde, Santi Adavani, Baskar Ganapathysubramanian

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Abstract

    Recent progress in scientific machine learning (SciML) has opened up the possibility of training novel neural network architectures that solve complex partial differential equations (PDEs). Several (nearly data free) approaches have been recently reported that successfully solve PDEs, with examples including deep feed forward networks, generative networks, and deep encoder-decoder networks. However, practical adoption of these approaches is limited by the difficulty in training these models, especially to make predictions at large output resolutions (≥ 1024 × 1024).Here we report on a software framework for data parallel distributed deep learning that resolves the twin challenges of training these large SciML models training in reasonable time as well as distributing the storage requirements. Our framework provides several out of the box functionality including (a) loss integrity independent of number of processes, (b) synchronized batch normalization, and (c) distributed higher-order optimization methods.We show excellent scalability of this framework on both cloud as well as HPC clusters, and report on the interplay between bandwidth, network topology and bare metal vs cloud. We deploy this approach to train generative models of sizes hitherto not possible, showing that neural PDE solvers can be viably trained for practical applications. We also demonstrate that distributed higher-order optimization methods are 2-3 × faster than stochastic gradient-based methods and provide minimal convergence drift with higher batch-size.

    Original languageEnglish (US)
    Title of host publicationProceedings of 2020 IEEE/ACM Workshop on Machine Learning in High Performance Computing Environments, MLHPC 2020 and Workshop on Artificial Intelligence and Machine Learning for Scientific Applications, AI4S 2020 - Held in conjunction with SC 2020
    Subtitle of host publicationThe International Conference for High Performance Computing, Networking, Storage and Analysis
    PublisherInstitute of Electrical and Electronics Engineers Inc.
    Pages50-63
    Number of pages14
    ISBN (Electronic)9780738110783
    DOIs
    StatePublished - Nov 2020
    Event6th IEEE/ACM Workshop on Machine Learning in High Performance Computing Environments, MLHPC 2020 and 1st Workshop on Artificial Intelligence and Machine Learning for Scientific Applications, AI4S 2020 - Virtual, Online, United States
    Duration: Nov 12 2020 → …

    Publication series

    NameProceedings of 2020 IEEE/ACM Workshop on Machine Learning in High Performance Computing Environments, MLHPC 2020 and Workshop on Artificial Intelligence and Machine Learning for Scientific Applications, AI4S 2020 - Held in conjunction with SC 2020: The International Conference for High Performance Computing, Networking, Storage and Analysis

    Conference

    Conference6th IEEE/ACM Workshop on Machine Learning in High Performance Computing Environments, MLHPC 2020 and 1st Workshop on Artificial Intelligence and Machine Learning for Scientific Applications, AI4S 2020
    CountryUnited States
    CityVirtual, Online
    Period11/12/20 → …

    Keywords

    • Cloud vs hpc
    • Deep generative models
    • Distributed training
    • Higher-order optimization
    • Loss functions
    • Pdes

    ASJC Scopus subject areas

    • Artificial Intelligence
    • Computer Science Applications

    Fingerprint Dive into the research topics of 'Deep generative models that solve pdes: Distributed computing for training large data-free models'. Together they form a unique fingerprint.

    Cite this