We introduce a simple and novel method for the weakly supervised problem of Part-Of-Speech tagging with a dictionary. Our method involves training a connectionist network that simultaneously learns a distributed latent representation of the words, while maximizing the tagging accuracy. To compensate for the unavailability of true labels, we resort to training the model using a Curriculum: instead of random order, the model is trained using an ordered sequence of training samples, proceeding from "easier" to "harder" samples. On a standard test corpus, we show that without using any grammatical information, our model is able to outperform the standard EM algorithm in tagging accuracy, and its performance is comparable to other state-of-the-art models. We also show that curriculum learning for this setting significantly improves performance, both in terms of speed of convergence and in terms of generalization.