Approximate Computing (AC) has emerged as a means for improving the performance, area and power-/energy-efficiency of a digital design at the cost of output quality degradation. Applications like machine learning (e.g., using DNNs-deep neural networks) are highly computationally intensive and, therefore, can significantly benefit from AC and specialized accelerators. However, the accuracy loss introduced because of approximations in the DNN accelerator hardware can result in undesirable results. This paper presents a novel method to design high-performance DNN accelerators where approximation error(s) from one stage/part of the design is "completely" compensated in the subsequent stage/part while offering significant efficiency gains. Towards this, the paper also presents a case-study for improving the performance of systolic array-based hardware architectures, which are commonly used for accelerating state-of-the-art deep learning algorithms.