Reconstructing Hands in 3D with Transformers

Georgios Pavlakos, Dandan Shan, Ilija Radosavovic, Angjoo Kanazawa, David Fouhey, Jitendra Malik

Research output: Contribution to journalConference articlepeer-review

Abstract

We present an approach that can reconstruct hands in 3D from monocular input. Our approach for Hand Mesh Recovery, HaMeR, follows a fully transformer-based architecture and can analyze hands with significantly increased accuracy and robustness compared to previous work. The key to HaMeR's success lies in scaling up both the data used for training and the capacity of the deep network for hand reconstruction. For training data, we combine multiple datasets that contain 2D or 3D hand annotations. For the deep model, we use a large scale Vision Transformer architecture. Our final model consistently outperforms the previous baselines on popular 3D hand pose benchmarks. To further evaluate the effect of our design in non-controlled settings, we annotate existing in-the-wild datasets with 2D hand keypoint annotations. On this newly collected dataset of annotations, HInt, we demonstrate significant improvements over existing baselines. We will make our code, data and models publicly available upon publication. We make our code, data and models available on the project website: https://geopavlakos.github.io/hamer/. 'It is because of his being armed with hands that man is the most intelligent animal.' Anaxagoras.

Original languageEnglish (US)
Pages (from-to)9826-9836
Number of pages11
JournalProceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
DOIs
StatePublished - 2024
Event2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024 - Seattle, United States
Duration: Jun 16 2024Jun 22 2024

Keywords

  • 3D hand pose estimation
  • hand mesh recovery
  • single-image 3D

ASJC Scopus subject areas

  • Software
  • Computer Vision and Pattern Recognition

Fingerprint

Dive into the research topics of 'Reconstructing Hands in 3D with Transformers'. Together they form a unique fingerprint.

Cite this