A Survey on Deep Generative 3D-aware Image Synthesis

ACM Computing Surveys, 2023
Weihao Xia · Jing-Hao Xue

arxiv Paper Project Page CSUR Paper

Introduction

This project lists representative papers/codes/datasets about deep 3D-aware image synthesis. Besides 3D-aware Generative Models (GANs and Diffusion Models) discussed in this survey, this project additionally covers novel view synthesis studies, especially those based on implicit neural representations such as NeRF.

We aim to constantly update the latest relevant papers and help the community track this topic. Please feel free to join us and contribute to the project. Please do not hesitate to reach out if you have any questions or suggestions.

Survey paper

3D Control of 2D GANs

3D Control Latent Directions

For 3D control over diffusion models simiar to GAN, please refer to semantic manipulation in diffusion latent spaces.

3D Parameters as Controls

3D Prior Knowledge as Constraints

3D-aware GANs for a Single Image Category

Unconditional 3D Generative Models

Conditional 3D Generative Models

3D-aware Diffusion Models for a Single Image Category

3D-Aware Generative Models on ImageNet

3D-aware Video Synthesis

INR-based 3D Novel View Synthesis

Neural Scene Representations

Acceleration

From Constrained to In-the-wild Conditions

Few Images

Pose-free

Varying Appearance

Large-scale Scene

Dynamic Scene

The following papers are not directly related to 3D-aware image synthesis. But it would be beneficial to pay attention to those works. For example, in our survey, inverse rendering are not classified as 3D-aware image synthesis as they are not deliberately designed for this purpose. But with the inferred intrinsic components, photorealistic images can be rendered. 3D reconstruction models geometry only with no appearance information, meaning them not able to render images with photorealistic textures. But these representations have been introduced as the geometric representation along with a textural representation (e.g., Texture Field) for 3D-aware image synthesis.

3D Representations

Neural Inverse Rendering (Neural De-rendering)

Inverse rendering is to infer underlying intrinsic components of a scene from rendered 2D images. These properties include shape (surface, depth, normal), material (albedo, reflectivity, shininess), and lighting (direction, intensity), which can be further used to render photorealistic images.

Neural Rerendering

Datasets

Summary of popular 3D-aware image synthesis datasets.

Multi-view image collections

The images are rendered or collected according to different experimental settings, such as Synthetic-NeRF dataset, the DTU dataset, and the Tanks and Temples dataset for general purposes, the crowded Phototourism dataset for varying lighting conditions, the Blender Forward Facing (BLEFF) dataset to benchmark camera parameter estimation and novel view synthesis quality, and the San Francisco Alamo Square Dataset for large-scale scenes.

Examples of multi-view image datasets.

dataset published in # scene # samples per scene range (m × m) resolution keyword
DeepVoxels CVPR 2019 4 simple objects 479 / 1,000 \ 512 × 512 synthetic, 360 degree
NeRF Synthetics ECCV 2020 8 complex objects 100 / 200 \ 800 ×800 synthetic, 360 degree
NeRF Captured ECCV 2020 8 complex scenes 20-62 a few 1,008 × 756 real, forward-facing
DTU CVPR 2014 124 scenes 49 or 64 a few to thousand 1,600 × 1,200 often used in few-views
Tanks and Temples CVPR 2015 14 objects and scenes 4,395 - 21,871 dozen to thousand 8-megapixel real, large-scale
Phototourism IJCV 2021 6 landmarks 763-2,000 dozen to thousand 564-1,417 megapixel varying illumination
Alamo Square CVPR 2022 San Francisco 2,818,745 570 × 960 1,200 × 900 real, large-scale

Single-view image collections

Summary of popular single-view image datasets organized by their major categories and sorted by their popularity.

dataset year category # samples resolution keyword
FFHQ CVPR 2019 Human Face 70k 1024 × 1024 single simple-shape
AFHQ CVPR 2020 Cat, Dog, and Wildlife 15k 512 × 512 single simple-shape
CompCars CVPR 2015 Real Car 136K 256 × 256 single simple-shape
CARLA CoRL 2017 Synthetic Car 10k 128 × 128 single simple-shape
CLEVR CVPR 2017 Objects 100k 256 × 256 multiple, simple-shape
LSUN 2015 Bedroom 300K 256 × 256 single, simple-shape
CelebA ICCV 2015 Human Face 200k 178 × 218 single simple-shape
CelebA-HQ ICLR 2018 Human Face 30k 1024 × 1024 single, simple-shape
MetFaces NeurIPS 2020 Art Face 1336 1024 × 1024 single, simple-shape
M-Plants NeurIPS 2022 Variable-Shape 141,824 256 × 256 single, variable-shape
M-Food NeurIPS 2022 Variable-Shape 25,472 256 × 256 single, variable-shape

Citation

If this repository benefits your research, please consider citing our paper.

  @inproceedings{xia2023survey,
    title={A Survey on Deep Generative 3D-aware Image Synthesis},
    author={Xia, Weihao and Xue, Jing-Hao},
    booktitle={ACM Computing Surveys (CSUR)},
    year={2023}
  }

License

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.