TY - JOUR
T1 - Eigen-distortions of hierarchical representations
AU - Berardino, Alexander
AU - Ballé, Johannes
AU - Laparra, Valero
AU - Simoncelli, Eero
N1 - Funding Information:
The authors would like to thank the members of the LCV and VNL groups at NYU, especially Olivier Henaff and Najib Majaj, for helpful feedback and comments on the manuscript. Additionally, we thank Rebecca Walton and Lydia Cassard for their tireless efforts in collecting the perceptual data presented here. This work was funded in part by the Howard Hughes Medical Institute, the NEI Visual Neuroscience Training Program and the Samuel J. and Joan B. Williamson Fellowship.
Publisher Copyright:
© 2017 Neural information processing systems foundation. All rights reserved.
PY - 2017
Y1 - 2017
N2 - We develop a method for comparing hierarchical image representations in terms of their ability to explain perceptual sensitivity in humans. Specifically, we utilize Fisher information to establish a model-derived prediction of sensitivity to local perturbations of an image. For a given image, we compute the eigenvectors of the Fisher information matrix with largest and smallest eigenvalues, corresponding to the model-predicted most- and least-noticeable image distortions, respectively. For human subjects, we then measure the amount of each distortion that can be reliably detected when added to the image. We use this method to test the ability of a variety of representations to mimic human perceptual sensitivity. We find that the early layers of VGG16, a deep neural network optimized for object recognition, provide a better match to human perception than later layers, and a better match than a 4-stage convolutional neural network (CNN) trained on a database of human ratings of distorted image quality. On the other hand, we find that simple models of early visual processing, incorporating one or more stages of local gain control, trained on the same database of distortion ratings, provide substantially better predictions of human sensitivity than either the CNN, or any combination of layers of VGG16.
AB - We develop a method for comparing hierarchical image representations in terms of their ability to explain perceptual sensitivity in humans. Specifically, we utilize Fisher information to establish a model-derived prediction of sensitivity to local perturbations of an image. For a given image, we compute the eigenvectors of the Fisher information matrix with largest and smallest eigenvalues, corresponding to the model-predicted most- and least-noticeable image distortions, respectively. For human subjects, we then measure the amount of each distortion that can be reliably detected when added to the image. We use this method to test the ability of a variety of representations to mimic human perceptual sensitivity. We find that the early layers of VGG16, a deep neural network optimized for object recognition, provide a better match to human perception than later layers, and a better match than a 4-stage convolutional neural network (CNN) trained on a database of human ratings of distorted image quality. On the other hand, we find that simple models of early visual processing, incorporating one or more stages of local gain control, trained on the same database of distortion ratings, provide substantially better predictions of human sensitivity than either the CNN, or any combination of layers of VGG16.
UR - http://www.scopus.com/inward/record.url?scp=85047000082&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85047000082&partnerID=8YFLogxK
M3 - Conference article
AN - SCOPUS:85047000082
SN - 1049-5258
VL - 2017-December
SP - 3531
EP - 3540
JO - Advances in Neural Information Processing Systems
JF - Advances in Neural Information Processing Systems
T2 - 31st Annual Conference on Neural Information Processing Systems, NIPS 2017
Y2 - 4 December 2017 through 9 December 2017
ER -