TY - JOUR
T1 - Sparse Logistic Regression for RR Lyrae versus Binaries Classification
AU - Trevisan, Piero
AU - Pasquato, Mario
AU - Carenini, Gaia
AU - Mekhaël, Nicolas
AU - Braga, Vittorio F.
AU - Bono, Giuseppe
AU - Abbas, Mohamad
N1 - Publisher Copyright:
© 2023. The Author(s). Published by the American Astronomical Society.
PY - 2023/6/1
Y1 - 2023/6/1
N2 - RR Lyrae (RRL) stars are old, low-mass, radially pulsating variable stars in their core helium burning phase. They are popular stellar tracers and primary distance indicators since they obey well-defined period-luminosity relations in the near-infrared regime. Their photometric identification is not trivial; indeed, RRL star samples can be contaminated by eclipsing binaries, especially in large data sets produced by fully automatic pipelines. Interpretable machine-learning approaches for separating eclipsing binaries from RRL stars are thus needed. Ideally, they should be able to achieve high precision in identifying RRL stars while generalizing new data from different instruments. In this paper, we train a simple logistic regression classifier on Catalina Sky Survey (CSS) light curves. It achieves a precision of 87% at 78% recall for the RRL star class on unseen CSS light curves. It generalizes on out-of-sample data (ASAS/ASAS-SN light curves) with a precision of 85% at 96% recall. We also considered a L1-regularized version of our classifier, which reaches 90% sparsity in the light-curve features with a limited trade-off in accuracy on our CSS validation set and - remarkably - also on the ASAS/ASAS-SN light-curve test set. Logistic regression is natively interpretable, and regularization allows us to point out the parts of the light curves that matter the most in classification. We thus achieved both good generalization and full interpretability.
AB - RR Lyrae (RRL) stars are old, low-mass, radially pulsating variable stars in their core helium burning phase. They are popular stellar tracers and primary distance indicators since they obey well-defined period-luminosity relations in the near-infrared regime. Their photometric identification is not trivial; indeed, RRL star samples can be contaminated by eclipsing binaries, especially in large data sets produced by fully automatic pipelines. Interpretable machine-learning approaches for separating eclipsing binaries from RRL stars are thus needed. Ideally, they should be able to achieve high precision in identifying RRL stars while generalizing new data from different instruments. In this paper, we train a simple logistic regression classifier on Catalina Sky Survey (CSS) light curves. It achieves a precision of 87% at 78% recall for the RRL star class on unseen CSS light curves. It generalizes on out-of-sample data (ASAS/ASAS-SN light curves) with a precision of 85% at 96% recall. We also considered a L1-regularized version of our classifier, which reaches 90% sparsity in the light-curve features with a limited trade-off in accuracy on our CSS validation set and - remarkably - also on the ASAS/ASAS-SN light-curve test set. Logistic regression is natively interpretable, and regularization allows us to point out the parts of the light curves that matter the most in classification. We thus achieved both good generalization and full interpretability.
UR - http://www.scopus.com/inward/record.url?scp=85163993016&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85163993016&partnerID=8YFLogxK
U2 - 10.3847/1538-4357/accf8f
DO - 10.3847/1538-4357/accf8f
M3 - Article
AN - SCOPUS:85163993016
SN - 0004-637X
VL - 950
JO - Astrophysical Journal
JF - Astrophysical Journal
IS - 2
M1 - 103
ER -