We present a novel algorithm for learning multi-scale sparse representation for visual tracking. In our method, sparse codes with max pooling are used to form a multi-scale representation that integrates spatial configuration over patches of different sizes. Different from other sparse representation methods, our method uses both holistic and local descriptors. In the hybrid framework, we formulate a new confidence measure that combines generative and discriminative confidence scores. We also devised an efficient method to update templates for adaptation to appearance changes. We demonstrate the effectiveness of our method with experiments and show that our method outperforms other state-of-the-art tracking algorithms.