Fast visual recognition in the mammalian cortex seems to be a hierarchical process by which the representation of the visual world is transformed in multiple stages from low-level retinotopic features to high-level, global and invariant features, and to object categories. Every single step in this hierarchy seems to be subject to learning. How does the visual cortex learn such hierarchical representations by just looking at the world? How could computers learn such representations from data? Computer vision models that are weakly inspired by the visual cortex will be described. A number of unsupervised learning algorithms to train these models will be presented, which are based on the sparse auto-encoder concept. The effectiveness of these algorithms for learning invariant feature hierarchies will be demonstrated with a number of practical tasks such as scene parsing, pedestrian detection, and object classification.