Saliency-based sequential image attention with multiset prediction

Sean Welleck, Jialin Mao, Kyunghyun Cho, Zheng Zhang

Research output: Contribution to journalConference articlepeer-review


Humans process visual scenes selectively and sequentially using attention. Central to models of human visual attention is the saliency map. We propose a hierarchical visual architecture that operates on a saliency map and uses a novel attention mechanism to sequentially focus on salient regions and take additional glimpses within those regions. The architecture is motivated by human visual attention, and is used for multi-label image classification on a novel multiset task, demonstrating that it achieves high precision and recall while localizing objects with its attention. Unlike conventional multi-label image classification models, the model supports multiset prediction due to a reinforcement-learning based training process that allows for arbitrary label permutation and multiple instances per label.

Original languageEnglish (US)
Pages (from-to)5174-5184
Number of pages11
JournalAdvances in Neural Information Processing Systems
StatePublished - 2017
Event31st Annual Conference on Neural Information Processing Systems, NIPS 2017 - Long Beach, United States
Duration: Dec 4 2017Dec 9 2017

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Information Systems
  • Signal Processing


Dive into the research topics of 'Saliency-based sequential image attention with multiset prediction'. Together they form a unique fingerprint.

Cite this