One test of a new training algorithm is how well the algorithm generalizes from the training data to the test data. It is shown that a new training algorithm termed double backpropagation improves generalization by simultaneously minimizing the normal energy term found in backpropagation and an additional energy term that is related to the sum of the squares of the input derivatives (gradients). In normal backpropagation training, minimizing the energy function tends to push the input gradient to zero. However, this is not always possible. Double backpropagation explicitly pushes the input gradients to zero, making the minimum broader, and increases the generalization on the test data. The authors show the improvement over normal backpropagation on four candidate architectures and a training set of 320 handwritten numbers and a test set of size 180.