TY - JOUR
T1 - A data-driven model for spectra
T2 - Finding double redshifts in the Sloan Digital Sky Survey
AU - Tsalmantza, P.
AU - Hogg, David W.
N1 - Copyright:
Copyright 2015 Elsevier B.V., All rights reserved.
PY - 2012/7/10
Y1 - 2012/7/10
N2 - We present a data-driven method - heteroscedastic matrix factorization, a kind of probabilistic factor analysis - for modeling or performing dimensionality reduction on observed spectra or other high-dimensional data with known but non-uniform observational uncertainties. The method uses an iterative inverse-variance-weighted least-squares minimization procedure to generate a best set of basis functions. The method is similar to principal components analysis (PCA), but with the substantial advantage that it uses measurement uncertainties in a responsible way and accounts naturally for poorly measured and missing data; it models the variance in the noise-deconvolved data space. A regularization can be applied, in the form of a smoothness prior (inspired by Gaussian processes) or a non-negative constraint, without making the method prohibitively slow. Because the method optimizes a justified scalar (related to the likelihood), the basis provides a better fit to the data in a probabilistic sense than any PCA basis. We test the method on Sloan Digital Sky Survey (SDSS) spectra, concentrating on spectra known to contain two redshift components: these are spectra of gravitational lens candidates and massive black hole binaries. We apply a hypothesis test to compare one-redshift and two-redshift models for these spectra, utilizing the data-driven model trained on a random subset of all SDSS spectra. This test confirms 129 of the 131 lens candidates in our sample and all of the known binary candidates, and turns up very few false positives.
AB - We present a data-driven method - heteroscedastic matrix factorization, a kind of probabilistic factor analysis - for modeling or performing dimensionality reduction on observed spectra or other high-dimensional data with known but non-uniform observational uncertainties. The method uses an iterative inverse-variance-weighted least-squares minimization procedure to generate a best set of basis functions. The method is similar to principal components analysis (PCA), but with the substantial advantage that it uses measurement uncertainties in a responsible way and accounts naturally for poorly measured and missing data; it models the variance in the noise-deconvolved data space. A regularization can be applied, in the form of a smoothness prior (inspired by Gaussian processes) or a non-negative constraint, without making the method prohibitively slow. Because the method optimizes a justified scalar (related to the likelihood), the basis provides a better fit to the data in a probabilistic sense than any PCA basis. We test the method on Sloan Digital Sky Survey (SDSS) spectra, concentrating on spectra known to contain two redshift components: these are spectra of gravitational lens candidates and massive black hole binaries. We apply a hypothesis test to compare one-redshift and two-redshift models for these spectra, utilizing the data-driven model trained on a random subset of all SDSS spectra. This test confirms 129 of the 131 lens candidates in our sample and all of the known binary candidates, and turns up very few false positives.
KW - black hole physics
KW - cosmology: observations
KW - gravitational lensing: strong
KW - methods: data analysis
KW - methods: statistical
KW - techniques: spectroscopic
UR - http://www.scopus.com/inward/record.url?scp=84862878902&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84862878902&partnerID=8YFLogxK
U2 - 10.1088/0004-637X/753/2/122
DO - 10.1088/0004-637X/753/2/122
M3 - Review article
AN - SCOPUS:84862878902
VL - 753
JO - Astrophysical Journal
JF - Astrophysical Journal
SN - 0004-637X
IS - 2
M1 - 122
ER -