Efficient utilization of GPGPU cache hierarchy

Mahmoud Khairy, Mohamed Zahran, Amr G. Wassal

Research output: Chapter in Book/Report/Conference proceedingConference contribution


Recent GPUs are equipped with general-purpose L1 and L2 caches in an attempt to reduce memory bandwidth demand and improve the performance of some irregular GPGPU applications. However, due to the massive multithreading, GPGPU caches suffer from severe resource contention and low data-sharing which may degrade the performance in-stead. In this work, we propose three techniques to efficiently utilize and improve the performance of GPGPU caches. The first technique aims to dynamically detect and bypass memory accesses that show streaming behavior. In the second technique, we propose dynamic warp throttling via cores sampling (DWT-CS) to alleviate cache thrashing by throttling the number of active warps per core. DWT-CS monitors the MPKI at L1, when it exceeds a specific threshold, all GPU cores are sampled with different number of active warps to find the optimal number of warps that mitigates thrashing and achieves the highest performance. Our pro-posed third technique addresses the problem of GPU cache associativity since many GPGPU applications suffer from severe associativity stalls and conflict misses. Prior work proposed cache bypassing on associativity stalls. In this work, instead of bypassing, we employ a better cache indexing function, Pseudo Random Interleaving Cache (PRIC), that is based on polynomial modulus mapping, in order to fairly and evenly distribute memory accesses over cache sets. The proposed techniques improve the average performance of streaming and contention applications by 1.2X and 2.3X respectively. Compared to prior work, it achieves 1.7X and 1.5X performance improvement over Cache-Conscious Wave-front Scheduler and Memory Request Prioritization Buffer respectively.

Original languageEnglish (US)
Title of host publicationACM International Conference Proceeding Series
EditorsXiang Gong
PublisherAssociation for Computing Machinery
Number of pages12
ISBN (Electronic)9781450334075
StatePublished - Feb 7 2015
Event8th Annual Workshop on General Purpose Processing using Graphics Processing Unit, GPGPU 2015 - San Francisco, United States
Duration: Feb 7 2015 → …

Publication series

NameACM International Conference Proceeding Series


Conference8th Annual Workshop on General Purpose Processing using Graphics Processing Unit, GPGPU 2015
Country/TerritoryUnited States
CitySan Francisco
Period2/7/15 → …


  • Cache bypassing
  • Cache management
  • Conflict-avoiding
  • Warp throttling

ASJC Scopus subject areas

  • Software
  • Human-Computer Interaction
  • Computer Vision and Pattern Recognition
  • Computer Networks and Communications


Dive into the research topics of 'Efficient utilization of GPGPU cache hierarchy'. Together they form a unique fingerprint.

Cite this