Pix2NeRF: Unsupervised Conditional p-GAN for Single Image to Neural Radiance Fields Translation
Shengqu Cai, Anton Obukhov, Dengxin Dai, Luc Van Gool

Figure 1. Overview of Pix2NeRF: We propose a method for unsupervised learning of neural representations of scenes, sharing a common pose prior. At test time, Pix2NeRF disentangles pose and content from an input image and renders novel views of the content. Top: π-GAN is trained on a dataset without pose supervision. Bottom: a trained model is conditioned on a single image to obtain pose-dependent views.

Fig.2: Overview of building blocks and objectives, used in Pix2NeRF. GAN objectives follow π-GAN and ensure that NeRF outputs match the distribution of real images preal under the latent prior pz and pose prior pd. Reconstruction and GAN inversion objectives ensure calibrated latent representations, such that E and G can operate as an auto-encoder. The conditional adversarial objective enables learning better representations without explicit pose supervision. Legend: green – trained module, blue – frozen, gradient – warm-up.

Fig.3: Reconstructions and novel views on CARLA, CelebA, and ShapeNet-SRN chairs

Fig.4: Qualitative results of ablation studies, obtained with an image from the test split of CelebA.