A theoretical-empirical approach to estimating sample complexity of DNNs

Devansh Bisla, Apoorva Nandini Saridena, Anna Choromanska

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

This paper focuses on understanding how the generalization error scales with the amount of the training data for deep neural networks (DNNs). Existing techniques in statistical learning theory require a computation of capacity measures, such as VC dimension, to provably bound this error. It is however unclear how to extend these measures to DNNs and therefore the existing analyses are applicable to simple neural networks, which are not used in practice, e.g., linear or shallow (at most two-layer) ones or otherwise multi-layer perceptrons. Moreover many theoretical error bounds are not empirically verifiable. In this paper we derive estimates of the generalization error that hold for deep networks and do not rely on unattainable capacity measures. The enabling technique in our approach hinges on two major assumptions: i) the network achieves zero training error, ii) the probability of making an error on a test point is proportional to the distance between this point and its nearest training point in the feature space and at certain maximal distance (that we call radius) it saturates. Based on these assumptions we estimate the generalization error of DNNs. The obtained estimate scales as mathcal{O}left( {frac{1}{{delta {N-{1/d}}}}} right), where N is the size of the training data, and is parameterized by two quantities, the effective dimensionality of the data as perceived by the network (d) and the aforementioned radius (δ), both of which we find empirically. We show that our estimates match with the experimentally-obtained behavior of the error on multiple learning tasks using benchmark data-sets and realistic models. Estimating training data requirements is essential for deployment of safety critical applications such as autonomous driving, medical diagnostics etc. Furthermore, collecting and annotating training data requires a huge amount of financial, computational and human resources. Our empirical estimates will help to efficiently allocate resources.

Original languageEnglish (US)
Title of host publicationProceedings - 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2021
PublisherIEEE Computer Society
Pages3264-3274
Number of pages11
ISBN (Electronic)9781665448994
DOIs
StatePublished - Jun 2021
Event2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2021 - Virtual, Online, United States
Duration: Jun 19 2021Jun 25 2021

Publication series

NameIEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops
ISSN (Print)2160-7508
ISSN (Electronic)2160-7516

Conference

Conference2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2021
Country/TerritoryUnited States
CityVirtual, Online
Period6/19/216/25/21

ASJC Scopus subject areas

  • Computer Vision and Pattern Recognition
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'A theoretical-empirical approach to estimating sample complexity of DNNs'. Together they form a unique fingerprint.

Cite this