A simplified and self-contained tutorial for understanding 3D Gaussian Splatting [Under Construction]
The end goal of this blog is to go over the necessary concepts and explanations to understand 3D Gaussian Splatting. I created this blog for my own learning and to simplify learning for others as well without the need to go around the internet gathering and intersecting the pieces of knowledge. This tutorial is based on several readings from different sources which will be cited with links.
3D Gaussian Splatting is a deep learning-based method used to create an implicit 3D representation of a scene, which allows for projecting the scene on to a 2D surface from different view points.The 3D representation is learned from a view available images of the same scene from different view points, along with their camera position information. The goal is to make the representation general enough so that we can project the scene on to novel view points. Also, whatever representation we have, we need to have an algorithm to allow rendering for rendering to a 2D image.
A 3D scene can be represented by one of the following:
These include Neural Radiance Fields (NeRFs) and Gaussian Splatting, which will be discussed in detail in later sections.
In order to train a NeRF, we need to construct a dataset of \(N\) images, each with its corresponding camera position information. Then, a neural network, (usually an MLP) is trained to take 5 coordinates \((x, y, z, \theta, \phi)\) as input, where \((x, y, z)\) are the 3D location information, and \((\theta, \phi)\) are the angles determining the view point angle. The network outputs the color and density values for each pixel of an image of a particular view. The network hence can be optimized by matching the ground truth image with the generated one over the views available in the dataset