TY - GEN
T1 - Parallel Block-Delayed Sequences
AU - Westrick, Sam
AU - Rainey, Mike
AU - Anderson, Daniel
AU - Blelloch, Guy E.
N1 - Publisher Copyright:
© 2022 Owner/Author.
PY - 2022/4/2
Y1 - 2022/4/2
N2 - Programming languages using functions on collections of values, such as map, reduce, scan and filter, have been used for over fifty years. Such collections have proven to be particularly useful in the context of parallelism because such functions are naturally parallel. However, if implemented naively they lead to the generation of temporary intermediate collections that can significantly increase memory usage and runtime. To avoid this pitfall, many approaches use "fusion"to combine operations and avoid temporary results. However, most of these approaches involve significant changes to a compiler and are limited to a small set of functions, such as maps and reduces. In this paper we present a library-based approach that fuses widely used operations such as scans, filters, and flattens. In conjunction with existing techniques, this covers most of the common operations on collections. Our approach is based on a novel technique which parallelizes over blocks, with streams within each block. We demonstrate the approach by implementing libraries targeting multicore parallelism in two languages: Parallel ML and C++, which have very different semantics and compilers. To help users understand when to use the approach, we define a cost semantics that indicates when fusion occurs and how it reduces memory allocations. We present experimental results for a dozen benchmarks that demonstrate significant reductions in both time and space. In most cases the approach generates code that is near optimal for the machines it is running on.
AB - Programming languages using functions on collections of values, such as map, reduce, scan and filter, have been used for over fifty years. Such collections have proven to be particularly useful in the context of parallelism because such functions are naturally parallel. However, if implemented naively they lead to the generation of temporary intermediate collections that can significantly increase memory usage and runtime. To avoid this pitfall, many approaches use "fusion"to combine operations and avoid temporary results. However, most of these approaches involve significant changes to a compiler and are limited to a small set of functions, such as maps and reduces. In this paper we present a library-based approach that fuses widely used operations such as scans, filters, and flattens. In conjunction with existing techniques, this covers most of the common operations on collections. Our approach is based on a novel technique which parallelizes over blocks, with streams within each block. We demonstrate the approach by implementing libraries targeting multicore parallelism in two languages: Parallel ML and C++, which have very different semantics and compilers. To help users understand when to use the approach, we define a cost semantics that indicates when fusion occurs and how it reduces memory allocations. We present experimental results for a dozen benchmarks that demonstrate significant reductions in both time and space. In most cases the approach generates code that is near optimal for the machines it is running on.
KW - collections
KW - functional programming
KW - fusion
KW - parallel programming
UR - http://www.scopus.com/inward/record.url?scp=85127593032&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85127593032&partnerID=8YFLogxK
U2 - 10.1145/3503221.3508434
DO - 10.1145/3503221.3508434
M3 - Conference contribution
AN - SCOPUS:85127593032
T3 - Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPOPP
SP - 61
EP - 75
BT - PPoPP 2022 - Proceedings of the 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
PB - Association for Computing Machinery
T2 - 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP 2022
Y2 - 2 April 2022 through 6 April 2022
ER -