## Abstract

Suppose we would like to know all answers to a set of statistical queries C on a data set up to small error, but we can access the data itself only by using statistical queries. A trivial solution is to exhaustively ask all queries in C. In this paper, we investigate how and when we can do better than this naïve approach. We show that the number of statistical queries necessary and sufficient for this task is-up to polynomial factors-equal to the agnostic learning complexity of C in Kearns' statistical query (SQ) model. This gives a complete answer to the question when run-time is not a concern. We then show that the problem can be solved efficiently (allowing arbitrary error on a small fraction of queries) whenever the answers to C can be described by a submodular function. This includes many natural concept classes, such as graph cuts and Boolean disjunctions and conjunctions. These results are interesting not only from a learning theoretic point of view, but also from the perspective of privacy-preserving data analysis. In this context, our second result leads to an algorithm that efficiently releases differentially private answers to all Boolean conjunctions with 1% average error. This presents significant progress on a key open problem in privacy-preserving data analysis. Our first result, on the other hand, gives unconditional lower bounds on any differentially private algorithm that admits a (potentially non-privacy-preserving) implementation using only statistical queries. Not only our algorithms but also most known private algorithms can be implemented using only statistical queries and hence are constrained by these lower bounds. Our result therefore isolates the complexity of agnostic learning in the SQ model as a new barrier in the design of differentially private algorithms.

Original language | English (US) |
---|---|

Pages (from-to) | 1494-1520 |

Number of pages | 27 |

Journal | SIAM Journal on Computing |

Volume | 42 |

Issue number | 4 |

DOIs | |

State | Published - 2013 |

## Keywords

- Agnostic learning
- Differential privacy
- SQ learning

## ASJC Scopus subject areas

- General Computer Science
- General Mathematics