TY - GEN
T1 - Hybrid compiler and microarchitecture technique for cache traffic optimization
AU - Zahran, Mohamed
AU - Bhowmik, Anasua
PY - 2005
Y1 - 2005
N2 - Memory system is one of the main performance-limiting factors in contemporary processors. This is due to the gap between the memory system, speed and the processor speed. This results in moving as much memory as possible from off-chip to on-chip. Furthermore, we are on a sustained effort into integrating a larger number of devices per chip. This renders integrating a large on-chip memory feasible. However, cache memories are starting to give diminishing returns. One of the main reasons for that is the delay in writing back the data of the replaced block to memory or to the next level cache. This makes block replacement time consuming, and therefore affects the overall performance. In this paper, we present a compiler-microarchitecture hybrid technique for solving the cache traffic problem. The microarchitecture part deals with bandwidth management. This is done by predicting the time at which a dirty cache block will no longer be written before replacement, and writing it back to the memory, at the time of low traffic. Thus, when the block is replaced, it is clean and the replacement is done much faster. The compiler technique deals with bandwidth saving. The compiler detects values that are dead, and hence do not need to be written to the memory altogether. Therefore, reducing the traffic to the memory and making the replacement faster. We show that the proposed techniques reduce the writebacks from L1 cache by 24% for SpecINT and 18% for SpecFP. Moreover, around half of the dirty blocks are cleared during low traffic time, and before their actual replacement time.
AB - Memory system is one of the main performance-limiting factors in contemporary processors. This is due to the gap between the memory system, speed and the processor speed. This results in moving as much memory as possible from off-chip to on-chip. Furthermore, we are on a sustained effort into integrating a larger number of devices per chip. This renders integrating a large on-chip memory feasible. However, cache memories are starting to give diminishing returns. One of the main reasons for that is the delay in writing back the data of the replaced block to memory or to the next level cache. This makes block replacement time consuming, and therefore affects the overall performance. In this paper, we present a compiler-microarchitecture hybrid technique for solving the cache traffic problem. The microarchitecture part deals with bandwidth management. This is done by predicting the time at which a dirty cache block will no longer be written before replacement, and writing it back to the memory, at the time of low traffic. Thus, when the block is replaced, it is clean and the replacement is done much faster. The compiler technique deals with bandwidth saving. The compiler detects values that are dead, and hence do not need to be written to the memory altogether. Therefore, reducing the traffic to the memory and making the replacement faster. We show that the proposed techniques reduce the writebacks from L1 cache by 24% for SpecINT and 18% for SpecFP. Moreover, around half of the dirty blocks are cleared during low traffic time, and before their actual replacement time.
UR - http://www.scopus.com/inward/record.url?scp=33744473365&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=33744473365&partnerID=8YFLogxK
U2 - 10.1109/INTERACT.2005.8
DO - 10.1109/INTERACT.2005.8
M3 - Conference contribution
AN - SCOPUS:33744473365
SN - 0769523218
SN - 9780769523217
T3 - Proceedings - Annual Workshop on Interaction between Compilers and Computer Architectures, INTERACT
SP - 58
EP - 69
BT - Proceedings - 9th Annual Workshop on Interaction between Compilers and Computer Architectures INTERACT-9, in conjunction with the 11th Int. Symp. on High-performance Comput. Architecture, HPCA-11
T2 - 9th Annual Workshop on Interaction between Compilers and Computer Architectures INTERACT-9, in conjunction with the 11th International Symposium on High-performance Computer Architecture, HPCA-11
Y2 - 13 February 2005 through 13 February 2005
ER -