TY - JOUR
T1 - Sparsity-Specific Code Optimization using Expression Trees
AU - Herholz, Philipp
AU - Tang, Xuan
AU - Schneider, Teseo
AU - Kamil, Shoaib
AU - Panozzo, Daniele
AU - Sorkine-Hornung, Olga
N1 - Funding Information:
This work was partially supported by the NSF CAREER award under Grant No. 1652515, the NSF grants OAC-1835712, OIA-1937043, CHS-1908767, CHS-1901091, NSERC DGECR-2021-00461 and RGPIN-2021-03707, a Sloan Fellowship, a gift from Adobe Research, a gift from Advanced Micro Devices, Inc and by the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No. 101003104).
Publisher Copyright:
© 2022 Copyright held by the owner/author(s).
PY - 2022/5/13
Y1 - 2022/5/13
N2 - We introduce a code generator that converts unoptimized C++ code operating on sparse data into vectorized and parallel CPU or GPU kernels. Our approach unrolls the computation into a massive expression graph, performs redundant expression elimination, grouping, and then generates an architecture-specific kernel to solve the same problem, assuming that the sparsity pattern is fixed, which is a common scenario in many applications in computer graphics and scientific computing. We show that our approach scales to large problems and can achieve speedups of two orders of magnitude on CPUs and three orders of magnitude on GPUs, compared to a set of manually optimized CPU baselines. To demonstrate the practical applicability of our approach, we employ it to optimize popular algorithms with applications to physical simulation and interactive mesh deformation.
AB - We introduce a code generator that converts unoptimized C++ code operating on sparse data into vectorized and parallel CPU or GPU kernels. Our approach unrolls the computation into a massive expression graph, performs redundant expression elimination, grouping, and then generates an architecture-specific kernel to solve the same problem, assuming that the sparsity pattern is fixed, which is a common scenario in many applications in computer graphics and scientific computing. We show that our approach scales to large problems and can achieve speedups of two orders of magnitude on CPUs and three orders of magnitude on GPUs, compared to a set of manually optimized CPU baselines. To demonstrate the practical applicability of our approach, we employ it to optimize popular algorithms with applications to physical simulation and interactive mesh deformation.
KW - Code optimisation
KW - sparse computation
UR - http://www.scopus.com/inward/record.url?scp=85139459815&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85139459815&partnerID=8YFLogxK
U2 - 10.1145/3520484
DO - 10.1145/3520484
M3 - Article
AN - SCOPUS:85139459815
VL - 41
JO - ACM Transactions on Graphics
JF - ACM Transactions on Graphics
SN - 0730-0301
IS - 5
M1 - 175
ER -