## Abstract

We develop a new, principled algorithm for estimating the contribution of training data points to the behavior of a deep learning model, such as a specific prediction it makes. Our algorithm estimates the AME, a quantity that measures the expected (average) marginal effect of adding a data point to a subset of the training data, sampled from a given distribution. When subsets are sampled from the uniform distribution, the AME reduces to the well-known Shapley value. Our approach is inspired by causal inference and randomized experiments: we sample different subsets of the training data to train multiple submodels, and evaluate each submodel's behavior. We then use a LASSO regression to jointly estimate the AME of each data point, based on the subset compositions. Under sparsity assumptions (k â‰ª N datapoints have large AME), our estimator requires only O(k log N) randomized submodel trainings, improving upon the best prior Shapley value estimators.

Original language | English (US) |
---|---|

Pages (from-to) | 13468-13504 |

Number of pages | 37 |

Journal | Proceedings of Machine Learning Research |

Volume | 162 |

State | Published - 2022 |

Event | 39th International Conference on Machine Learning, ICML 2022 - Baltimore, United States Duration: Jul 17 2022 → Jul 23 2022 |

## ASJC Scopus subject areas

- Artificial Intelligence
- Software
- Control and Systems Engineering
- Statistics and Probability