Field-of-view prediction in 360-degree videos with attention-based neural encoder-decoder networks

Jiang Yu, Yong Liu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

In this paper, we propose attention-based neural encoder-decoder networks for predicting user Field-of-View (FoV) in 360-degree videos. Our proposed prediction methods are based on the attention mechanism that learns the weighted prediction power of historical FoV time series through end-to-end training. Attention-based neural encoder-decoder networks do not involve recursion, thus can be highly parallelized during training. Using publicly available 360-degree head movement datasets, we demonstrate that our FoV prediction models outperform the state-of-art FoV prediction models, achieving lower prediction error, higher training throughput, and faster convergence. Better FoV prediction leads to reduced bandwidth consumption, better video quality, and improved user quality of experience.

Original languageEnglish (US)
Title of host publicationProceedings of the 11th ACM Workshop on Immersive Mixed and Virtual Environment Systems, MMVE 2019
PublisherAssociation for Computing Machinery, Inc
Pages37-42
Number of pages6
ISBN (Electronic)9781450362993
DOIs
StatePublished - Jun 18 2019
Event11th ACM SIGMM Workshop on Immersive Mixed and Virtual Environment Systems, MMVE 2019 - Amherst, United States
Duration: Jun 18 2019 → …

Publication series

NameProceedings of the 11th ACM Workshop on Immersive Mixed and Virtual Environment Systems, MMVE 2019

Conference

Conference11th ACM SIGMM Workshop on Immersive Mixed and Virtual Environment Systems, MMVE 2019
CountryUnited States
CityAmherst
Period6/18/19 → …

Keywords

  • 360 degree videos
  • Attention
  • Encoder decoder networks
  • Field of view prediction
  • Neural networks

ASJC Scopus subject areas

  • Human-Computer Interaction
  • Software
  • Computer Graphics and Computer-Aided Design

Fingerprint Dive into the research topics of 'Field-of-view prediction in 360-degree videos with attention-based neural encoder-decoder networks'. Together they form a unique fingerprint.

Cite this