Fast four-way parallel radix sorting on GPUs

Linh Ha, Jens Krüger, Cláudio T. Silva

Research output: Contribution to journalArticlepeer-review


Efficient sorting is a key requirement for many computer science algorithms. Acceleration of existing techniques as well as developing new sorting approaches is crucial for many real-time graphics scenarios, database systems, and numerical simulations to name just a few. It is one of the most fundamental operations to organize and filter the ever growing massive amounts of data gathered on a daily basis. While optimal sorting models for serial execution on a single processor exist, efficient parallel sorting remains a challenge. In this paper, we present a hardware-optimized parallel implementation of the radix sort algorithm that results in a significant speed up over existing sorting implementations. We outperform all known General Processing Unit (GPU) based sorting systems by about a factor of two and eliminate restrictions on the sorting key space. This makes our algorithm not only the fastest, but also the first general GPU sorting solution.

Original languageEnglish (US)
Pages (from-to)2368-2378
Number of pages11
JournalComputer Graphics Forum
Issue number8
StatePublished - Dec 2009


  • Collision detection
  • GPU sorting
  • HPC
  • Parallel sorting

ASJC Scopus subject areas

  • Computer Graphics and Computer-Aided Design


Dive into the research topics of 'Fast four-way parallel radix sorting on GPUs'. Together they form a unique fingerprint.

Cite this