Memory system is one of the main performance-limiting factors in contemporary processors. This is due to the gap between the memory system, speed and the processor speed. This results in moving as much memory as possible from off-chip to on-chip. Furthermore, we are on a sustained effort into integrating a larger number of devices per chip. This renders integrating a large on-chip memory feasible. However, cache memories are starting to give diminishing returns. One of the main reasons for that is the delay in writing back the data of the replaced block to memory or to the next level cache. This makes block replacement time consuming, and therefore affects the overall performance. In this paper, we present a compiler-microarchitecture hybrid technique for solving the cache traffic problem. The microarchitecture part deals with bandwidth management. This is done by predicting the time at which a dirty cache block will no longer be written before replacement, and writing it back to the memory, at the time of low traffic. Thus, when the block is replaced, it is clean and the replacement is done much faster. The compiler technique deals with bandwidth saving. The compiler detects values that are dead, and hence do not need to be written to the memory altogether. Therefore, reducing the traffic to the memory and making the replacement faster. We show that the proposed techniques reduce the writebacks from L1 cache by 24% for SpecINT and 18% for SpecFP. Moreover, around half of the dirty blocks are cleared during low traffic time, and before their actual replacement time.