3D Gaussian Splatting is a rasterization technique introduced in the paper 3D Gaussian Splatting for Real-Time Radiance Field Rendering. It enables real-time rendering of photorealistic scenes learned from a small set of images. This article explains how it works and what it means for the future of graphics.
What is 3D Gaussian Splatting?
At its core, 3D Gaussian Splatting is a rasterization method: it takes scene data and draws it on the screen. This is similar to triangle rasterization in traditional computer graphics, but instead of triangles, it uses 3D Gaussians—elliptical shapes defined by parameters like position (XYZ), covariance (a 3×3 matrix for scaling and rotation), color (RGB), and alpha (transparency).
A single Gaussian looks like a soft blob. When you combine millions of them, they can represent complex scenes with fine details. For example, a scene like a bicycle can be reconstructed using around 7 million Gaussians, each contributing to the final image.
How It Works
-
Structure from Motion (SfM): First, a set of 2D images is processed using SfM (e.g., with COLMAP) to estimate a 3D point cloud of the scene.
-
Convert to Gaussians: Each point from the point cloud is converted into a Gaussian. Initially, only position and color are known from SfM. To achieve high quality, the Gaussians must be trained.
-
Training via Stochastic Gradient Descent (SGD): The training loop is:
- Rasterize the Gaussians to an image using differentiable Gaussian rasterization.
- Compute the loss between the rendered image and the ground truth image.
- Adjust Gaussian parameters (position, covariance, color, alpha) based on the loss.
- Apply automated densification: If a Gaussian's gradient is large (i.e., it's failing), it is cloned (if small) or split (if large). Gaussians with very low alpha are removed.
-
Differentiable Gaussian Rasterization: The rasterizer is both fast and differentiable. The process involves:
- Projecting each Gaussian into 2D from the camera's perspective.
- Sorting Gaussians by depth.
- For each pixel, blending Gaussians front-to-back.
Differentiability is crucial for training, but once trained, Gaussians can be rendered with non-differentiable methods.
Why All the Buzz?
The excitement stems from the method's ability to produce high-quality, real-time rendered scenes. However, there are many open questions: Can Gaussians be animated? (A paper on dynamic 3D Gaussians suggests yes.) Can they handle reflections? Can they be modeled without reference images? Additionally, 3D Gaussian Splatting offers a dense 3D representation that could benefit Embodied AI, where understanding 3D space remains a challenge.
The Future of Graphics
Pros:
- High-quality, photorealistic scenes
- Real-time rasterization speed
- Relatively fast training
Cons:
- High VRAM usage (4 GB for viewing, 12 GB for training)
- Large disk size (1 GB+ per scene)
- Not compatible with existing rendering pipelines (e.g., Vulkan, DirectX)
- Currently static (though research into dynamic Gaussians is emerging)
The original CUDA implementation has not yet been adapted to production pipelines. Early adaptations include a remote viewer and other experimental implementations, but widespread adoption awaits integration with standard graphics APIs.