Pix2NeRF: Unsupervised Conditional p-GAN for Single Image to Neural Radiance Fields Translation

Shengqu Cai, Anton Obukhov, Dengxin Dai, Luc Van Gool

We propose a pipeline to generate Neural Radiance Fields (NeRF) of an object or a scene of a specific class, conditioned on a single input image. This is a challenging task, as training NeRF requires multiple views of the same scene, coupled with corresponding poses, which are hard to obtain. Our method is based on pi-GAN, a generative model for unconditional 3D-aware image synthesis, which maps random latent codes to radiance fields of a class of objects. We jointly optimize (1) the pi-GAN objective to utilize its high-fidelity 3D-aware generation and (2) a carefully designed reconstruction objective. The latter includes an encoder coupled with pi-GAN generator to form an auto-encoder. Unlike previous few-shot NeRF approaches, our pipeline is unsupervised, capable of being trained with independent images without 3D, multi-view, or pose supervision. Applications of our pipeline include 3d avatar generation, object-centric novel view synthesis with a single input image, and 3d-aware super-resolution, to name a few.

Figure 1. Overview of Pix2NeRF: We propose a method for unsupervised learning of neural representations of scenes, sharing a common pose prior. At test time, Pix2NeRF disentangles pose and content from an input image and renders novel views of the content. Top: π-GAN is trained on a dataset without pose supervision. Bottom: a trained model is conditioned on a single image to obtain pose-dependent views.

Fig.2: Overview of building blocks and objectives, used in Pix2NeRF. GAN objectives follow π-GAN  and ensure that NeRF outputs match the distribution of real images preal under the latent prior pz and pose prior pd. Reconstruction and GAN inversion objectives ensure calibrated latent representations, such that E and G can operate as an auto-encoder. The conditional adversarial objective enables learning better representations without explicit pose supervision. Legend: green – trained module, blue – frozen, gradient – warm-up.

Fig.3: Reconstructions and novel views on CARLA, CelebA, and ShapeNet-SRN chairs

Fig.4: Qualitative results of ablation studies, obtained with an image from the test split of CelebA.

Publication:

Pix2NeRF: Unsupervised Conditional p-GAN for Single Image to Neural Radiance Fields Translation

Shengqu Cai, Anton Obukhov, Dengxin Dai, Luc Van Gool

CVPR 2022

[Paper][Code][BibTex]