This paper proposes a new method, that we call VisualBackProp, for visualizing which sets of pixels of the input image contribute most to the predictions made by the convolutional neural network (CNN). The method heavily hinges on exploring the intuition that the feature maps contain less and less irrelevant information to the prediction decision when moving deeper into the network. The technique we propose is dedicated for CNN-based systems for steering self-driving cars and is therefore required to run in real-time. This makes the proposed visualization method a valuable debugging tool which can be easily used during both training and inference. We justify our approach with theoretical arguments and confirm that the proposed method identifies sets of input pixels, rather than individual pixels, that collaboratively contribute to the prediction. We utilize the proposed visualization tool in the NVIDIA neural-network-based end-to-end learning system for autonomous driving, known as PilotNet. We demonstrate that VisualBackProp determines which elements in the road image most influence PilotNet's steering decision and indeed captures relevant objects on the road. The empirical evaluation furthermore shows the plausibility of the proposed approach on public road video data as well as in other applications and reveals that it compares favorably to the layer-wise relevance propagation approach, i.e. it obtains similar visualization results and achieves order of magnitude speed-ups.