ThemeStation: Generating Theme-Aware 3D Assets from Few Exemplars

ThemeStation can generate a gallery of theme-consistent 3D assets from few exemplars.

Abstract

Real-world applications often require a large gallery of 3D assets that share a consistent theme. While remarkable advances have been made in general 3D content creation from text or image, synthesizing customized 3D assets following the shared theme of input 3D exemplars remains an open and challenging problem. In this work, we present ThemeStation, a novel approach for theme-aware 3D-to-3D generation. ThemeStation synthesizes customized 3D assets based on given few exemplars with two goals: 1) unity for generating 3D assets that thematically align with the given exemplars and 2) diversity for generating 3D assets with a high degree of variations. To this end, we design a two-stage framework that draws a concept image first, followed by a reference-informed 3D modeling stage. We propose a novel dual score distillation (DSD) loss to jointly leverage priors from both the input exemplars and the synthesized concept image. Extensive experiments and user studies confirm that ThemeStation surpasses prior works in producing diverse theme-aware 3D models with impressive quality. ThemeStation also enables various applications such as controllable 3D-to-3D generation.

Approach Overview

method

Given just one (this figure) or a few reference models (exemplars), our approach can generate theme-consistent 3D models in two stages. In the first stage, we fine-tune a pre-trained text-to-image diffusion model to form a customized theme-driven diffusion model that produces various concept images. In the second stage, we conduct reference-informed 3D asset modeling by progressively optimizing a rough initial model (omitted in this figure for brevity), which is obtained using an off-the-shelf image-to-3D method given the concept image, into an exquisite generated model. We use a novel dual score distillation (DSD) loss for optimization, which applies concept prior and reference prior at different noise levels (denoising timesteps).

Controllable 3D-to-3D

ThemeStation supports the application of controllable 3D-to-3D generation given user-specified text prompts, demonstrating the immense potential of ThemeStation to be seamlessly combined with emerging controllable image generation techniques for more interesting 3D-to-3D application.

Demo Video