InseRF: Text-Driven Generative Object Insertion in Neural 3D Scenes

1ETH Zurich, 2Google Zurich
*Equal Contribution

InseRF generates an object in a 3D scene via a text prompt and one 2D bounding box.

Abstract

We introduce InseRF, a novel method for generative object insertion in the NeRF reconstructions of 3D scenes. Based on a user-provided textual description and a 2D bounding box in a reference viewpoint, InseRF generates new objects in 3D scenes. Recently, methods for 3D scene editing have been profoundly transformed, owing to the use of strong priors of text-to-image diffusion models in 3D generative modeling. Existing methods are mostly effective in editing 3D scenes via style and appearance changes or removing existing objects. Generating new objects, however, remains a challenge for such methods, which we address in this study. Specifically, we propose grounding the 3D object insertion to a 2D object insertion in a reference view of the scene. The 2D edit is then lifted to 3D using a single-view object reconstruction method. The reconstructed object is then inserted into the scene, guided by the priors of monocular depth estimation methods. We evaluate our method on various 3D scenes and provide an in-depth analysis of the proposed components. Our experiments with generative insertion of objects in several 3D scenes indicate the effectiveness of our method compared to the existing methods. InseRF is capable of controllable and 3D-consistent object insertion without requiring explicit 3D information as input.

teaser.

Overview of InseRF


Given a single reference view annotated with a 2D bounding box and a text prompt describing the object to be inserted, a 2D edit is generated portraying a view of the object. This 2D edit is then warped to a 3D model of the object and placed into the scene. After the 3D placement, the object and scene representations are fused. Finally, an optional refinement can be performed to further improve the appearance.

Method overview.

Visual Results

Adding a Panettone

Original Scene
Edited Scene
Edited Depth

Adding a Mission blue butterfly

Original Scene
Edited Scene
Edited Depth

Adding a pouffe

Original Scene
Edited Scene
Edited Depth

Adding a garden gnome

Original Scene
Edited Scene
Edited Depth

Adding a handbag

Original Scene
Edited Scene
Edited Depth

Adding a Moai stone statue

Original Scene
Edited Scene
Edited Depth

Adding a pepper grinder

Original Scene
Edited Scene
Edited Depth

Adding a cup

Original Scene
Edited Scene
Edited Depth

Adding a purple crystal

Original Scene
Edited Scene
Edited Depth

Adding a mug

Original Scene
Edited Scene
Edited Depth

BibTeX

@article{shahbazi2024inserf,
  author    = {Shahbazi, Mohamad and Claessens, Liesbeth and Niemeyer, Michael and Collins, Edo and Tonioni, Alessio and Van Gool, Luc and Tombari, Federico},
  title     = {InseRF: Text-Driven Generative Object Insertion in Neural 3D Scenes},
  journal   = {Arxiv},
  year      = {2024},
}