Air pollution monitoring and management is one of the key challenges for urban sectors, especially in developing countries. Measuring pollution levels requires significant investment in reliable and durable instrumentation and subsequent maintenance. On the other hand, there have been many attempts by researchers to develop image-based pollution measurement models, which have shown significant results and established the feasibility of the idea. But, taking image-level models to a city-level system presents new challenges, which include scarcity of high-quality annotated data and a high amount of label noise. In this paper, we present a low-cost, end-to-end system for learning pollution maps using images captured through a mobile phone. We demonstrate our system for parts of New Delhi and Ghaziabad. We use transfer learning to overcome the problem of data scarcity. We investigate the effects of label noise in detail and introduce the metric of in-interval accuracy to evaluate our models in presence of noise. We use distributed averaging to learn pollution maps and mitigate the effects of noise to some extent. We also develop haze-based interpretable models which have comparable performance to mainstream models. With only 382 images from Delhi and Ghaziabad and single-scene dataset from Beijing and Shanghai, we are able to achieve a mean absolute error of 44 µg/m3in PM2.5 concentration on a test set of 267 images and an in-interval accuracy of 67% on predictions. Going further, we learn pollution maps with a mean absolute error as low as 35 µg/m3and in-interval accuracy as high as 74% significantly mitigating the image models' error. We also show that the noise in pollution labels emerging from unreliable sensing instrumentation forms a significant barrier to the realization of an ideal air pollution monitoring system. Our code-base can be found at https://github.com/ankitbha/pollution with images.