Abstract
Programming accelerators such as GPUs withlow-level APIs and languages such as OpenCL and CUDAis difficult, error-prone, and not performance-portable. Au-tomatic parallelization and domain specific languages (DSLs)have been proposed to hide complexity and regain performanceportability. We present P ENCIL, a rigorously-defined subset ofGNU C99 - enriched with additional language constructs - that enables compilers to exploit parallelism and produce highlyoptimized code when targeting accelerators. P ENCIL aims toserve both as a portable implementation language for libraries, and as a target language for DSL compilers. We implemented a P ENCIL-to-OpenCL backend using astate-of-the-art polyhedral compiler. The polyhedral compiler, extended to handle data-dependent control flow and non-affinearray accesses, generates optimized OpenCL code. To demon-strate the potential and performance portability of P ENCILand the P ENCIL-to-OpenCL compiler, we consider a numberof image processing kernels, a set of benchmarks from theRodinia and SHOC suites, and DSL embedding scenarios forlinear algebra (BLAS) and signal processing radar applications(SpearDE), and present experimental results for four GPUplatforms: AMD Radeon HD 5670 and R9 285, NVIDIAGTX 470, and ARM Mali-T604.
Original language | English (US) |
---|---|
Article number | 7429301 |
Pages (from-to) | 138-149 |
Number of pages | 12 |
Journal | Parallel Architectures and Compilation Techniques - Conference Proceedings, PACT |
DOIs | |
State | Published - 2015 |
Event | 24th International Conference on Parallel Architecture and Compilation, PACT 2015 - San Francisco, United States Duration: Oct 18 2015 → Oct 21 2015 |
Keywords
- Automatic optimization
- Domain specific languages
- Intermediate language
- OpenCL
- Polyhedral model
ASJC Scopus subject areas
- Software
- Theoretical Computer Science
- Hardware and Architecture