Data-driven, Interpretable Photometric Redshifts Trained on Heterogeneous and Unrepresentative Data

Boris Leistedt, David W. Hogg

    Research output: Contribution to journalArticlepeer-review


    We present a new method for inferring photometric redshifts in deep galaxy and quasar surveys, based on a data-driven model of latent spectral energy distributions (SEDs) and a physical model of photometric fluxes as a function of redshift. This conceptually novel approach combines the advantages of both machine learning methods and template fitting methods by building template SEDs directly from the spectroscopic training data. This is made computationally tractable with Gaussian processes operating in flux-redshift space, encoding the physics of redshifts and the projection of galaxy SEDs onto photometric bandpasses. This method alleviates the need to acquire representative training data or to construct detailed galaxy SED models; it requires only that the photometric bandpasses and calibrations be known or have parameterized unknowns. The training data can consist of a combination of spectroscopic and deep many-band photometric data with reliable redshifts, which do not need to entirely spatially overlap with the target survey of interest or even involve the same photometric bands. We showcase the method on the i-magnitude-selected, spectroscopically confirmed galaxies in the COSMOS field. The model is trained on the deepest bands (from SUBARU and HST) and photometric redshifts are derived using the shallower SDSS optical bands only. We demonstrate that we obtain accurate redshift point estimates and probability distributions despite the training and target sets having very different redshift distributions, noise properties, and even photometric bands. Our model can also be used to predict missing photometric fluxes or to simulate populations of galaxies with realistic fluxes and redshifts, for example.

    Original languageEnglish (US)
    Article number5
    JournalAstrophysical Journal
    Issue number1
    StatePublished - Mar 20 2017


    • galaxies: distances and redshifts
    • large-scale structure of universe

    ASJC Scopus subject areas

    • Astronomy and Astrophysics
    • Space and Planetary Science


    Dive into the research topics of 'Data-driven, Interpretable Photometric Redshifts Trained on Heterogeneous and Unrepresentative Data'. Together they form a unique fingerprint.

    Cite this