Zero-Shot Multi-Modal Artist-Controlled Retrieval and Exploration of 3D Object Sets

Kristofer Schlachter, Benjamin Ahlbrand, Zhu Wang, Ken Perlin, Valerio Ortenzi

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

When creating 3D content, highly specialized skills are generally needed to design and generate models of objects and other assets by hand. We address this problem through high-quality 3D asset retrieval from multi-modal inputs, including 2D sketches, images and text. We use CLIP as it provides a bridge to higher-level latent features. We use these features to perform a multi-modality fusion to address the lack of artistic control that affects common data-driven approaches. Our approach allows for multi-modal conditional feature-driven retrieval through a 3D asset database, by utilizing a combination of input latent embeddings. We explore the effects of different combinations of feature embeddings across different input types and weighting methods.

Original languageEnglish (US)
Title of host publicationProceedings - SIGGRAPH Asia 2022
Subtitle of host publicationTechnical Communications
EditorsStephen N. Spencer
PublisherAssociation for Computing Machinery, Inc
ISBN (Electronic)9781450394659
DOIs
StatePublished - Dec 6 2022
EventSIGGRAPH Asia 2022 Technical Communications - Computer Graphics and Interactive Techniques Conference - Asia, SA 2022 - Daegu, Korea, Republic of
Duration: Dec 6 2022Dec 9 2022

Publication series

NameProceedings - SIGGRAPH Asia 2022: Technical Communications

Conference

ConferenceSIGGRAPH Asia 2022 Technical Communications - Computer Graphics and Interactive Techniques Conference - Asia, SA 2022
Country/TerritoryKorea, Republic of
CityDaegu
Period12/6/2212/9/22

Keywords

  • computer graphics
  • neural networks

ASJC Scopus subject areas

  • Computer Graphics and Computer-Aided Design
  • Computer Vision and Pattern Recognition
  • Software

Cite this