TY - GEN

T1 - Privately releasing conjunctions and the statistical query barrier

AU - Gupta, Anupam

AU - Hardt, Moritz

AU - Roth, Aaron

AU - Ullman, Jonathan

PY - 2011

Y1 - 2011

N2 - Suppose we would like to know all answers to a set of statistical queries C on a data set up to small error, but we can only access the data itself using statistical queries. A trivial solution is to exhaustively ask all queries in C. Can we do any better? We show that the number of statistical queries necessary and sufficient for this task is - up to polynomial factors - equal to the agnostic learning complexity of C in Kearns' statistical query (SQ)model. This gives a complete answer to the question when running time is not a concern. We then show that the problem can be solved efficiently (allowing arbitrary error on a small fraction of queries) whenever the answers to C can be described by a submodular function. This includes many natural concept classes, such as graph cuts and Boolean disjunctions and conjunctions. While interesting from a learning theoretic point of view, our main applications are in privacy-preserving data analysis: Here, our second result leads to an algorithm that efficiently releases differentially private answers to all Boolean conjunctions with 1% average error. This presents progress on a key open problem in privacy-preserving data analysis. Our first result on the other hand gives unconditional lower bounds on any differentially private algorithm that admits a (potentially non-privacy-preserving) implementation using only statistical queries. Not only our algorithms, but also most known private algorithms can be implemented using only statistical queries, and hence are constrained by these lower bounds. Our result therefore isolates the complexity of agnostic learning in the SQ-model as a new barrier in the design of differentially private algorithms.

AB - Suppose we would like to know all answers to a set of statistical queries C on a data set up to small error, but we can only access the data itself using statistical queries. A trivial solution is to exhaustively ask all queries in C. Can we do any better? We show that the number of statistical queries necessary and sufficient for this task is - up to polynomial factors - equal to the agnostic learning complexity of C in Kearns' statistical query (SQ)model. This gives a complete answer to the question when running time is not a concern. We then show that the problem can be solved efficiently (allowing arbitrary error on a small fraction of queries) whenever the answers to C can be described by a submodular function. This includes many natural concept classes, such as graph cuts and Boolean disjunctions and conjunctions. While interesting from a learning theoretic point of view, our main applications are in privacy-preserving data analysis: Here, our second result leads to an algorithm that efficiently releases differentially private answers to all Boolean conjunctions with 1% average error. This presents progress on a key open problem in privacy-preserving data analysis. Our first result on the other hand gives unconditional lower bounds on any differentially private algorithm that admits a (potentially non-privacy-preserving) implementation using only statistical queries. Not only our algorithms, but also most known private algorithms can be implemented using only statistical queries, and hence are constrained by these lower bounds. Our result therefore isolates the complexity of agnostic learning in the SQ-model as a new barrier in the design of differentially private algorithms.

KW - agnostic learning

KW - differential privacy

KW - submodular functions

UR - http://www.scopus.com/inward/record.url?scp=79959740503&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=79959740503&partnerID=8YFLogxK

U2 - 10.1145/1993636.1993742

DO - 10.1145/1993636.1993742

M3 - Conference contribution

AN - SCOPUS:79959740503

SN - 9781450306911

T3 - Proceedings of the Annual ACM Symposium on Theory of Computing

SP - 803

EP - 812

BT - STOC'11 - Proceedings of the 43rd ACM Symposium on Theory of Computing

PB - Association for Computing Machinery

T2 - 43rd ACM Symposium on Theory of Computing, STOC 2011

Y2 - 6 June 2011 through 8 June 2011

ER -