Abstract
Efficient sorting is a key requirement for many computer science algorithms. Acceleration of existing techniques as well as developing new sorting approaches is crucial for many real-time graphics scenarios, database systems, and numerical simulations to name just a few. It is one of the most fundamental operations to organize and filter the ever growing massive amounts of data gathered on a daily basis. While optimal sorting models for serial execution on a single processor exist, efficient parallel sorting remains a challenge. In this paper, we present a hardware-optimized parallel implementation of the radix sort algorithm that results in a significant speed up over existing sorting implementations. We outperform all known General Processing Unit (GPU) based sorting systems by about a factor of two and eliminate restrictions on the sorting key space. This makes our algorithm not only the fastest, but also the first general GPU sorting solution.
Original language | English (US) |
---|---|
Pages (from-to) | 2368-2378 |
Number of pages | 11 |
Journal | Computer Graphics Forum |
Volume | 28 |
Issue number | 8 |
DOIs | |
State | Published - Dec 2009 |
Keywords
- Collision detection
- GPGPU
- GPU sorting
- HPC
- Parallel sorting
ASJC Scopus subject areas
- Computer Graphics and Computer-Aided Design