Abstract
Converting widely-available 2D images and videos, captured using an RGB camera, to 3D can help accelerate the training of machine learning systems in spatial reasoning domains ranging from in-home assistive robots to augmented reality to autonomous vehicles. However, automating this task is challenging because it requires not only accurately estimating object location and orientation, but also requires knowing currently unknown camera properties (e.g., focal length). A scalable way to combat this problem is to leverage people's spatial understanding of scenes by crowdsourcing visual annotations of 3D object properties. Unfortunately, getting people to directly estimate 3D properties reliably is difficult due to the limitations of image resolution, human motor accuracy, and people's 3D perception (i.e., humans do not "see" depth like a laser range finder). In this paper, we propose a crowd-machine hybrid approach that jointly uses crowds' approximate measurements of multiple in-scene objects to estimate the 3D state of a single target object. Our approach can generate accurate estimates of the target object by combining heterogeneous knowledge from multiple contributors regarding various different objects that share a spatial relationship with the target object. We evaluate our joint object estimation approach with 363 crowd workers and show that our method can reduce errors in the target object's 3D location estimation by over 40%, while requiring only $35$% as much human time. Our work introduces a novel way to enable groups of people with different perspectives and knowledge to achieve more accurate collective performance on challenging visual annotation tasks.
Original language | English (US) |
---|---|
Article number | 51 |
Journal | Proceedings of the ACM on Human-Computer Interaction |
Volume | 4 |
Issue number | CSCW1 |
DOIs | |
State | Published - May 28 2020 |
Keywords
- 3D pose estimation
- answer aggregation
- computer vision
- crowdsourcing
- human computation
- optimization
- soft constraints
ASJC Scopus subject areas
- Social Sciences (miscellaneous)
- Human-Computer Interaction
- Computer Networks and Communications