Ziwei Ji, Matus Telgarsky, Ruicheng Xian

Research output: Contribution to conferencePaperpeer-review


This paper establishes rates of universal approximation for the shallow neural tangent kernel (NTK): network weights are only allowed microscopic changes from random initialization, which entails that activations are mostly unchanged, and the network is nearly equivalent to its linearization. Concretely, the paper has two main contributions: a generic scheme to approximate functions with the NTK by sampling from transport mappings between the initial weights and their desired values, and the construction of transport mappings via Fourier transforms. Regarding the first contribution, the proof scheme provides another perspective on how the NTK regime arises from rescaling: redundancy in the weights due to resampling allows individual weights to be scaled down. Regarding the second contribution, the most notable transport mapping asserts that roughly 110d nodes are sufficient to approximate continuous functions, where δ depends on the continuity properties of the target function. By contrast, nearly the same proof yields a bound of 12d for shallow ReLU networks; this gap suggests a tantalizing direction for future work, separating shallow ReLU networks and their linearization.

Original languageEnglish (US)
StatePublished - 2020
Event8th International Conference on Learning Representations, ICLR 2020 - Addis Ababa, Ethiopia
Duration: Apr 30 2020 → …


Conference8th International Conference on Learning Representations, ICLR 2020
CityAddis Ababa
Period4/30/20 → …

ASJC Scopus subject areas

  • Education
  • Linguistics and Language
  • Language and Linguistics
  • Computer Science Applications


Dive into the research topics of 'NEURAL TANGENT KERNELS, TRANSPORTATION MAPPINGS, AND UNIVERSAL APPROXIMATION'. Together they form a unique fingerprint.

Cite this