Active Linear Regression for lpNorms and Beyond

Cameron Musco, Christopher Musco, David P. Woodruff, Taisuke Yasuda

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Abstract

    We study active sampling algorithms for linear regression, which aim to query only a small number of entries of a target vector and output a near minimizer to the objective function. For lp norm regression for any 0 < p < 8, we give an algorithm based on Lewis weight sampling which outputs a(1+?)-approximate solution using just O(d/?2) queries to b for p ? (0,1), O(d/?) queries for p ? (1,2), and O(dp/2/?p) queries for p ? (2, 8). For p ? (0,2), our bounds are optimal up to logarithmic factors, thus settling the query complexity for this range of p. For p ? (2, 8), our dependence on d is optimal, while our dependence on ? is off by at most a single ? factor, up to logarithmic factors. Our result resolves an open question of Chen and Derezinski, who gave near optimal bounds for the l1 norm, but required at least d2/?2 samples for lp regression with p ? (1,2), and gave no bounds for p ? (2, 8) or p 8 (0,1). We also provide the first total sensitivity upper bound for loss functions with at most degree p polynomial growth. This improves a recent result of Tukan, Maalouf, and Feldman. By combining this with our techniques for lp regression, we obtain the first active regression algorithms for such loss functions, including the important cases of the Tukey and Huber losses. This answers another question of Chen and Derezinski. Our sensitivity bounds also give improvements to a variety of previous results using sensitivity sampling, including Orlicz norm subspace embeddings, robust subspace approximation, and dimension reduction for smoothed p-norms. Finally, our active sampling results give the first sublinear time algorithms for Kronecker product regression under every lp norm. Previous results required reading the entire b vector in the kernel feature space.11Extended abstract; full version available at https://arxiv.org/abs/2111.04888.

    Original languageEnglish (US)
    Title of host publicationProceedings - 2022 IEEE 63rd Annual Symposium on Foundations of Computer Science, FOCS 2022
    PublisherIEEE Computer Society
    Pages744-753
    Number of pages10
    ISBN (Electronic)9781665455190
    DOIs
    StatePublished - 2022
    Event63rd IEEE Annual Symposium on Foundations of Computer Science, FOCS 2022 - Denver, United States
    Duration: Oct 31 2022Nov 3 2022

    Publication series

    NameProceedings - Annual IEEE Symposium on Foundations of Computer Science, FOCS
    Volume2022-October
    ISSN (Print)0272-5428

    Conference

    Conference63rd IEEE Annual Symposium on Foundations of Computer Science, FOCS 2022
    Country/TerritoryUnited States
    CityDenver
    Period10/31/2211/3/22

    Keywords

    • active learning
    • linear regression

    ASJC Scopus subject areas

    • General Computer Science

    Fingerprint

    Dive into the research topics of 'Active Linear Regression for lpNorms and Beyond'. Together they form a unique fingerprint.

    Cite this