🎉 New Year Sale: Get 20% OFF on all plans — Use code NEWYEAR2026.

Upgrade now
Authored By: Dejia Xu, Ye Yuan, Morteza Mardani, Sifei Liu, Jiaming Song, Zhangyang Wang, Arash Vahdat

AGG: Amortized Generative 3D Gaussians for Single Image to 3D

Jan 24, 2024

Amortized Generative 3D Gaussians: A Novel Approach to Single Image to 3D Conversion Authors: Dejia Xu, University of Texas at Austin; Ye Yuan, Morteza Mardani, Sifei Liu, Jiaming Song, Zhangyang Wang, Arash Vahdat, NVIDIA As automatic 3D content creation pipelines become more critical in today's digital landscape, many approaches have been explored to generate 3D objects from single images. One model that has recently stood out in both 3D reconstruction and generation is the 3D Gaussian splatting-based model, recognized for its superior rendering efficiency. Existing 3D Gaussian splatting approaches often require optimization-based methods, involving many computationally expensive score-distillation steps. To overcome these challenges, researchers have introduced an Amortized Generative 3D Gaussian framework (AGG), which facilitates instant production of 3D Gaussians from a single image, thus eliminating the need for per-instance optimization. This new method uses a cascaded pipeline that first generates a coarse representation of the 3D data and later upsamples it into a 3D Gaussian super-resolution module. Subsequently, the data is evaluated against existing optimization-based 3D Gaussian frameworks. The AGG demonstrated competitive qualitative and quantitative generation abilities while being significantly faster. Image-to-3D generation allows users to control the generated content and has seen significant advances thanks to large-scale 3D-aware generative models. Representations such as point clouds, voxels, and occupancy grids have been studied as the media for training these models. 3D Gaussian splatting has gained attention due to its high-quality real-time rendering ability. However, generating 3D Gaussians has remained less studied. The AGG model differs from existing works as it uses a cascaded generation framework that instantly produces 3D Gaussians. The textured geometry of the 3D Gaussians is optimized jointly and stably using a hybrid representation and separate transformers. A super-resolution module is used in the second stage for effective upscaling. Compared with optimization-based 3D Gaussian pipelines and sampling-based frameworks using other 3D representations, the AGG demonstrates competitive performance. It enables zero-shot image-to-object generation and operates several orders of magnitude faster. It succeeds in generating 3D Gaussians in one shot, which previous models couldn't achieve. The development of an amortized model that predicts the 3D Gaussians instead of constructing them through optimization represents a significant leap forward in image-to-3D generation., Despite these promising advances, more research is necessary to improve the efficiency and accuracy of these models further.