Variational convolutional networks for human-centric annotations

Tsung Wei Ke, Che Wei Lin, Tyng Luh Liu, Davi Geiger

Research output: Chapter in Book/Report/Conference proceedingChapter

Abstract

To model how a human would annotate an image is an important and interesting task relevant to image captioning. Its main challenge is that a same visual concept may be important in some images but becomes less salient in other situations. Further, the subjective viewpoints of a human annotator also play a crucial role in finalizing the annotations. To deal with such high variability, we introduce a new deep net model that integrates a CNN with a variational auto-encoder (VAE). With the latent features embedded in a VAE, it becomes more flexible to tackle the uncertainly of human-centric annotations. On the other hand, the supervised generalization further enables the discriminative power of the generative VAE model. The resulting model can be end-to-end fine-tuned to further improve the performance on predicting visual concepts. The provided experimental results show that our method is state-of-the-art over two benchmark datasets: MS COCO and Flickr30K, producing mAP of 36.6 and 23.49, and PHR (Precision at Human Recall) of 49.9 and 32.04, respectively.

Original languageEnglish (US)
Title of host publicationComputer Vision - 13th Asian Conference on Computer Vision, ACCV 2016, Revised Selected Papers
EditorsKo Nishino, Shang-Hong Lai, Vincent Lepetit, Yoichi Sato
PublisherSpringer Verlag
Pages120-135
Number of pages16
ISBN (Print)9783319541891
DOIs
StatePublished - 2017

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume10114 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)

Fingerprint Dive into the research topics of 'Variational convolutional networks for human-centric annotations'. Together they form a unique fingerprint.

  • Cite this

    Ke, T. W., Lin, C. W., Liu, T. L., & Geiger, D. (2017). Variational convolutional networks for human-centric annotations. In K. Nishino, S-H. Lai, V. Lepetit, & Y. Sato (Eds.), Computer Vision - 13th Asian Conference on Computer Vision, ACCV 2016, Revised Selected Papers (pp. 120-135). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 10114 LNCS). Springer Verlag. https://doi.org/10.1007/978-3-319-54190-7_8