TY - JOUR
T1 - PAC learning with irrelevant attributes
AU - Dhagat, Aditi
AU - Hellerstein, Lisa
N1 - Funding Information:
*Supportedin part by NSF grant CCR-92-10957
Publisher Copyright:
© 1994 IEEE
PY - 1994
Y1 - 1994
N2 - We consider the problem of learning in the presence of irrelevant attributes in Valiant's PAC model [V84]-In the PAC model, the goal of the learner is to produce an approximately correct hypothesis from random sample data. If the number of relevant attributes in the target function is small, it may be desirable to produce a hypothesis that also depends on only a small number of variables. Haussler [H88] previously considered the problem of learning monomials of a small number of variables. He showed that the greedy set cover approximation algorithm can be used as a polynomial-time Occam algorithm for learning monomials on r of n variables. It outputs a monomial on r(lnq + 1) variables, where q is the number of negative examples in the sample. We extend this result by showing that there is a polynomial-time Occam algorithm for learning k-term DNF formulas depending on r of n variables that outputs a DNF formula depending on 0(rk logk q) variables, where q is the number of negative examples in the sample. We also give a polynomial-time Occam algorithm for learning decision lists (sometimes called 1-decision lists) with k alternations. It outputs a decision list with k alternations depending on 0(rk logk m) variables, where m is the size of the sample. Using recent non-approximability techniques, Hancock, Jiang, Li, and Tromp [HJLT94] have shown that, unless NP C DTIME[2Poly(logn)], decision lists with k alternations cannot be approximated within a multiplicative factor o/log n and decision lists with an unbounded number of alternations cannot be approximated in polynomial time within a multiplicative factor of 2 l o g γ n for any γ < 1.
AB - We consider the problem of learning in the presence of irrelevant attributes in Valiant's PAC model [V84]-In the PAC model, the goal of the learner is to produce an approximately correct hypothesis from random sample data. If the number of relevant attributes in the target function is small, it may be desirable to produce a hypothesis that also depends on only a small number of variables. Haussler [H88] previously considered the problem of learning monomials of a small number of variables. He showed that the greedy set cover approximation algorithm can be used as a polynomial-time Occam algorithm for learning monomials on r of n variables. It outputs a monomial on r(lnq + 1) variables, where q is the number of negative examples in the sample. We extend this result by showing that there is a polynomial-time Occam algorithm for learning k-term DNF formulas depending on r of n variables that outputs a DNF formula depending on 0(rk logk q) variables, where q is the number of negative examples in the sample. We also give a polynomial-time Occam algorithm for learning decision lists (sometimes called 1-decision lists) with k alternations. It outputs a decision list with k alternations depending on 0(rk logk m) variables, where m is the size of the sample. Using recent non-approximability techniques, Hancock, Jiang, Li, and Tromp [HJLT94] have shown that, unless NP C DTIME[2Poly(logn)], decision lists with k alternations cannot be approximated within a multiplicative factor o/log n and decision lists with an unbounded number of alternations cannot be approximated in polynomial time within a multiplicative factor of 2 l o g γ n for any γ < 1.
UR - http://www.scopus.com/inward/record.url?scp=0001797861&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=0001797861&partnerID=8YFLogxK
U2 - 10.1109/SFCS.1994.365704
DO - 10.1109/SFCS.1994.365704
M3 - Conference article
AN - SCOPUS:0001797861
SN - 0272-5428
SP - 64
EP - 74
JO - Proceedings - Annual IEEE Symposium on Foundations of Computer Science, FOCS
JF - Proceedings - Annual IEEE Symposium on Foundations of Computer Science, FOCS
T2 - Proceedings of the 35th IEEE Annual Symposium on Foundations of Computer Science
Y2 - 20 November 1994 through 22 November 1994
ER -