Rasterization

Given a set of 3D gaussians parametrized by means \(\mu \in \mathbb{R}^3\), covariances \(\Sigma \in \mathbb{R}^{3 \times 3}\), colors \(c\), and opacities \(o\), we first compute their projected means \(\mu' \in \mathbb{R}^2\) and covariances \(\Sigma' \in \mathbb{R}^{2 \times 2}\) on the image planes. Then we sort each gaussian such that all gaussians within the bounds of a tile are grouped and sorted by increasing depth \(z\), and then render each pixel within the tile with alpha-compositing.

Note, the 3D covariances are reparametrized with a scaling matrix \(S = \text{diag}(\mathbf{s}) \in \mathbb{R}^{3 \times 3}\) represented by a scale vector \(s \in \mathbb{R}^3\), and a rotation matrix \(R \in \mathbb{R}^{3 \times 3}\) represented by a rotation quaternion \(q \in \mathcal{R}^4\):

\[\Sigma = RSS^{T}R^{T}\]

The projection of 3D Gaussians is approximated with the Jacobian of the perspective projection equation:

\[\begin{split}J = \begin{bmatrix} f_{x}/z & 0 & -f_{x} t_{x}/z^{2} \\ 0 & f_{y}/z & -f_{y} t_{y}/z^{2} \\ 0 & 0 & 0 \end{bmatrix}\end{split}\]
\[\Sigma' = J W \Sigma W^{T} J^{T}\]

Where \([W | t]\) is the world-to-camera transformation matrix, and \(f_{x}, f_{y}\) are the focal lengths of the camera.

rasterization(means: Tensor, quats: Tensor, scales: Tensor, opacities: Tensor, colors: Tensor, viewmats: Tensor, Ks: Tensor, width: int, height: int, near_plane: float = 0.01, far_plane: float = 10000000000.0, radius_clip: float = 0.0, eps2d: float = 0.3, sh_degree: int | None = None, packed: bool = True, tile_size: int = 16, backgrounds: Tensor | None = None, render_mode: typing_extensions.Literal[RGB, D, ED, RGB + D, RGB + ED] = 'RGB', sparse_grad: bool = False, absgrad: bool = False, rasterize_mode: typing_extensions.Literal[classic, antialiased] = 'classic', channel_chunk: int = 32) Tuple[Tensor, Tensor, Dict]

Rasterize a set of 3D Gaussians (N) to a batch of image planes (C).

This function provides a handful features for 3D Gaussian rasterization, which we detail in the following notes. A complete profiling of the these features can be found in the Profiling page.

Note

Batch Rasterization: This function allows for rasterizing a set of 3D Gaussians to a batch of images in one go, by simplly providing the batched viewmats and Ks.

Note

Support N-D Features: If sh_degree is None, the colors is expected to be with shape [N, D] or [C, N, D], in which D is the channel of the features to be rendered. The computation is slow when D > 32 at the moment. If sh_degree is set, the colors is expected to be the SH coefficients with shape [N, K, 3] or [C, N, K, 3], where K is the number of SH bases. In this case, it is expected that \((\textit{sh_degree} + 1) ^ 2 \leq K\), where sh_degree controls the activated bases in the SH coefficients.

Note

Depth Rendering: This function supports colors or/and depths via render_mode. The supported modes are “RGB”, “D”, “ED”, “RGB+D”, and “RGB+ED”. “RGB” renders the colored image that respects the colors argument. “D” renders the accumulated z-depth \(\sum_i w_i z_i\). “ED” renders the expected z-depth \(\frac{\sum_i w_i z_i}{\sum_i w_i}\). “RGB+D” and “RGB+ED” render both the colored image and the depth, in which the depth is the last channel of the output.

Note

Memory-Speed Trade-off: The packed argument provides a trade-off between memory footprint and runtime. If packed is True, the intermediate results are packed into sparse tensors, which is more memory efficient but might be slightly slower. This is especially helpful when the scene is large and each camera sees only a small portion of the scene. If packed is False, the intermediate results are with shape [C, N, …], which is faster but might consume more memory.

Note

Sparse Gradients: If sparse_grad is True, the gradients for {means, quats, scales} will be stored in a COO sparse layout. This can be helpful for saving memory for training when the scene is large and each iteration only activates a small portion of the Gaussians. Usually a sparse optimizer is required to work with sparse gradients, such as torch.optim.SparseAdam. This argument is only effective when packed is True.

Note

Speed-up for Large Scenes: The radius_clip argument is extremely helpful for speeding up large scale scenes or scenes with large depth of fields. Gaussians with 2D radius smaller or equal than this value (in pixel unit) will be skipped during rasterization. This will skip all the far-away Gaussians that are too small to be seen in the image. But be warned that if there are close-up Gaussians that are also below this threshold, they will also get skipped (which is rarely happened in practice). This is by default disabled by setting radius_clip to 0.0.

Note

Antialiased Rendering: If rasterize_mode is “antialiased”, the function will apply a view-dependent compensation factor \(\rho=\sqrt{\frac{Det(\Sigma)}{Det(\Sigma+ \epsilon I)}}\) to Gaussian opacities, where \(\Sigma\) is the projected 2D covariance matrix and \(\epsilon\) is the eps2d. This will make the rendered image more antialiased, as proposed in the paper Mip-Splatting: Alias-free 3D Gaussian Splatting.

Note

AbsGrad: If absgrad is True, the absolute gradients of the projected 2D means will be computed during the backward pass, which could be accessed by meta[“means2d”].absgrad. This is an implementation of the paper AbsGS: Recovering Fine Details for 3D Gaussian Splatting, which is shown to be more effective for splitting Gaussians during training.

Warning

This function is currently not differentiable w.r.t. the camera intrinsics Ks.

Parameters:
  • means – The 3D centers of the Gaussians. [N, 3]

  • quats – The quaternions of the Gaussians. It’s not required to be normalized. [N, 4]

  • scales – The scales of the Gaussians. [N, 3]

  • opacities – The opacities of the Gaussians. [N]

  • colors – The colors of the Gaussians. [(C,) N, D] or [(C,) N, K, 3] for SH coefficients.

  • viewmats – The world-to-cam transformation of the cameras. [C, 4, 4]

  • Ks – The camera intrinsics. [C, 3, 3]

  • width – The width of the image.

  • height – The height of the image.

  • near_plane – The near plane for clipping. Default is 0.01.

  • far_plane – The far plane for clipping. Default is 1e10.

  • radius_clip – Gaussians with 2D radius smaller or equal than this value will be skipped. This is extremely helpful for speeding up large scale scenes. Default is 0.0.

  • eps2d – An epsilon added to the egienvalues of projected 2D covariance matrices. This will prevents the projected GS to be too small. For example eps2d=0.3 leads to minimal 3 pixel unit. Default is 0.3.

  • sh_degree – The SH degree to use, which can be smaller than the total number of bands. If set, the colors should be [(C,) N, K, 3] SH coefficients, else the colors should [(C,) N, D] post-activation color values. Default is None.

  • packed – Whether to use packed mode which is more memory efficient but might or might not be as fast. Default is True.

  • tile_size – The size of the tiles for rasterization. Default is 16. (Note: other values are not tested)

  • backgrounds – The background colors. [C, D]. Default is None.

  • render_mode – The rendering mode. Supported modes are “RGB”, “D”, “ED”, “RGB+D”, and “RGB+ED”. “RGB” renders the colored image, “D” renders the accumulated depth, and “ED” renders the expected depth. Default is “RGB”.

  • sparse_grad – If true, the gradients for {means, quats, scales} will be stored in a COO sparse layout. This can be helpful for saving memory. Default is False.

  • absgrad – If true, the absolute gradients of the projected 2D means will be computed during the backward pass, which could be accessed by meta[“means2d”].absgrad. Default is False.

  • rasterize_mode – The rasterization mode. Supported modes are “classic” and “antialiased”. Default is “classic”.

  • channel_chunk – The number of channels to render in one go. Default is 32. If the required rendering channels are larger than this value, the rendering will be done looply in chunks.

Returns:

render_colors: The rendered colors. [C, height, width, X]. X depends on the render_mode and input colors. If render_mode is “RGB”, X is D; if render_mode is “D” or “ED”, X is 1; if render_mode is “RGB+D” or “RGB+ED”, X is D+1.

render_alphas: The rendered alphas. [C, height, width, 1].

meta: A dictionary of intermediate results of the rasterization.

Return type:

A tuple

Examples:

>>> # define Gaussians
>>> means = torch.randn((100, 3), device=device)
>>> quats = torch.randn((100, 4), device=device)
>>> scales = torch.rand((100, 3), device=device) * 0.1
>>> colors = torch.rand((100, 3), device=device)
>>> opacities = torch.rand((100,), device=device)
>>> # define cameras
>>> viewmats = torch.eye(4, device=device)[None, :, :]
>>> Ks = torch.tensor([
>>>    [300., 0., 150.], [0., 300., 100.], [0., 0., 1.]], device=device)[None, :, :]
>>> width, height = 300, 200
>>> # render
>>> colors, alphas, meta = rasterization(
>>>    means, quats, scales, opacities, colors, viewmats, Ks, width, height
>>> )
>>> print (colors.shape, alphas.shape)
torch.Size([1, 200, 300, 3]) torch.Size([1, 200, 300, 1])
>>> print (meta.keys())
dict_keys(['camera_ids', 'gaussian_ids', 'radii', 'means2d', 'depths', 'conics',
'opacities', 'tile_width', 'tile_height', 'tiles_per_gauss', 'isect_ids',
'flatten_ids', 'isect_offsets', 'width', 'height', 'tile_size'])