Supporting very large models using automatic dataflow graph partitioning

Minjie Wang, Chien chin Huang, Jinyang Li

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

This paper presents Tofu, a system that partitions very large DNN models across multiple GPU devices to reduce per-GPU memory footprint. Tofu is designed to partition a dataflow graph of fine-grained tensor operators used by platforms like MXNet and TensorFlow. In order to automatically partition each operator, we propose to describe the semantics of an operator in a simple language inspired by Halide. To optimally partition different operators in a dataflow graph, Tofu uses a recursive search algorithm that minimizes the total communication cost. Our experiments on an 8-GPU machine show that Tofu enables the training of very large CNN and RNN models. It also achieves 25% - 400% speedup over alternative approaches to train very large models.

Original languageEnglish (US)
Title of host publicationProceedings of the 14th EuroSys Conference 2019
PublisherAssociation for Computing Machinery, Inc
ISBN (Electronic)9781450362818
DOIs
StatePublished - Mar 25 2019
Event14th European Conference on Computer Systems, EuroSys 2019 - Dresden, Germany
Duration: Mar 25 2019Mar 28 2019

Publication series

NameProceedings of the 14th EuroSys Conference 2019

Conference

Conference14th European Conference on Computer Systems, EuroSys 2019
Country/TerritoryGermany
CityDresden
Period3/25/193/28/19

ASJC Scopus subject areas

  • Hardware and Architecture
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'Supporting very large models using automatic dataflow graph partitioning'. Together they form a unique fingerprint.

Cite this