TY - GEN
T1 - Toucan-A Translator for Communication Tolerant MPI Applications
AU - Martin, Sergio M.
AU - Berger, Marsha J.
AU - Baden, Scott B.
N1 - Funding Information:
ACKNOWLEDGMENTS We would like to thank our anonymous reviewers for their insightful comments and suggestions. This research was supported by the Advanced Scientific Computing Research office of the U.S. Department of Energy under contracts No. DE-FC02-12ER26118 and DE-FG02-88ER25053. Ser-gio Martin was supported in part by the Fulbright Foreign Student Program grant from the U.S. Department of State, and a scholarship from Universidad Nacional de La Matanza, Departamento de Ingeniería e Investigaciones Tecnológicas. Scott Baden dedicates his contributions to this paper to the memory of Lillemor Nilsson (1937-2016).
Publisher Copyright:
© 2017 IEEE.
PY - 2017/6/30
Y1 - 2017/6/30
N2 - We discuss early results with Toucan, a source-to-source translatorthat automatically restructures C/C++ MPI applications tooverlap communication with computation. We co-designed thetranslator and runtime system to enable dynamic, dependence-drivenexecution of MPI applications, and require only a modest amount ofprogrammer annotation. Co-design was essential to realizingoverlap through dynamic code block reordering and avoiding the limitations of static code relocation and inlining. We demonstrate that Toucan hides significantcommunication in four representative applications running on up to 24Kcores of NERSC's Edison platform. Using Toucan, we have hidden from 33% to 85% of the communication overhead, with performance meeting or exceeding that of painstakingly hand-written overlap variants.
AB - We discuss early results with Toucan, a source-to-source translatorthat automatically restructures C/C++ MPI applications tooverlap communication with computation. We co-designed thetranslator and runtime system to enable dynamic, dependence-drivenexecution of MPI applications, and require only a modest amount ofprogrammer annotation. Co-design was essential to realizingoverlap through dynamic code block reordering and avoiding the limitations of static code relocation and inlining. We demonstrate that Toucan hides significantcommunication in four representative applications running on up to 24Kcores of NERSC's Edison platform. Using Toucan, we have hidden from 33% to 85% of the communication overhead, with performance meeting or exceeding that of painstakingly hand-written overlap variants.
KW - Communication/Computation Overlap
KW - Data-Driven
KW - MPI
KW - Source-to-Source Translator
UR - http://www.scopus.com/inward/record.url?scp=85027693885&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85027693885&partnerID=8YFLogxK
U2 - 10.1109/IPDPS.2017.44
DO - 10.1109/IPDPS.2017.44
M3 - Conference contribution
AN - SCOPUS:85027693885
T3 - Proceedings - 2017 IEEE 31st International Parallel and Distributed Processing Symposium, IPDPS 2017
SP - 998
EP - 1007
BT - Proceedings - 2017 IEEE 31st International Parallel and Distributed Processing Symposium, IPDPS 2017
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 31st IEEE International Parallel and Distributed Processing Symposium, IPDPS 2017
Y2 - 29 May 2017 through 2 June 2017
ER -