Rasterization

3DGS

Given a set of 3D gaussians parametrized by means \(\mu \in \mathbb{R}^3\), covariances \(\Sigma \in \mathbb{R}^{3 \times 3}\), colors \(c\), and opacities \(o\), we first compute their projected means \(\mu' \in \mathbb{R}^2\) and covariances \(\Sigma' \in \mathbb{R}^{2 \times 2}\) on the image planes. Then we sort each gaussian such that all gaussians within the bounds of a tile are grouped and sorted by increasing depth \(z\), and then render each pixel within the tile with alpha-compositing.

Note, the 3D covariances are reparametrized with a scaling matrix \(S = \text{diag}(\mathbf{s}) \in \mathbb{R}^{3 \times 3}\) represented by a scale vector \(s \in \mathbb{R}^3\), and a rotation matrix \(R \in \mathbb{R}^{3 \times 3}\) represented by a rotation quaternion \(q \in \mathcal{R}^4\):

\[\Sigma = RSS^{T}R^{T}\]

The projection of 3D Gaussians is approximated with the Jacobian of the perspective projection equation:

\[\begin{split}J = \begin{bmatrix} f_{x}/z & 0 & -f_{x} t_{x}/z^{2} \\ 0 & f_{y}/z & -f_{y} t_{y}/z^{2} \\ 0 & 0 & 0 \end{bmatrix}\end{split}\]
\[\Sigma' = J W \Sigma W^{T} J^{T}\]

Where \([W | t]\) is the world-to-camera transformation matrix, and \(f_{x}, f_{y}\) are the focal lengths of the camera.

rasterization(means: Tensor, quats: Tensor, scales: Tensor, opacities: Tensor, colors: Tensor, viewmats: Tensor, Ks: Tensor, width: int, height: int, near_plane: float = 0.01, far_plane: float = 10000000000.0, radius_clip: float = 0.0, eps2d: float = 0.3, sh_degree: int | None = None, packed: bool = True, tile_size: int = 16, backgrounds: Tensor | None = None, render_mode: typing_extensions.Literal[RGB, D, ED, RGB + D, RGB + ED] = 'RGB', sparse_grad: bool = False, absgrad: bool = False, rasterize_mode: typing_extensions.Literal[classic, antialiased] = 'classic', channel_chunk: int = 32, distributed: bool = False, camera_model: typing_extensions.Literal[pinhole, ortho, fisheye] = 'pinhole', covars: Tensor | None = None) Tuple[Tensor, Tensor, Dict][source]

Rasterize a set of 3D Gaussians (N) to a batch of image planes (C).

This function provides a handful features for 3D Gaussian rasterization, which we detail in the following notes. A complete profiling of the these features can be found in the Profiling page.

Note

Multi-GPU Distributed Rasterization: This function can be used in a multi-GPU distributed scenario by setting distributed to True. When distributed is True, a subset of total Gaussians could be passed into this function in each rank, and the function will collaboratively render a set of images using Gaussians from all ranks. Note to achieve balanced computation, it is recommended (not enforced) to have similar number of Gaussians in each rank. But we do enforce that the number of cameras to be rendered in each rank is the same. The function will return the rendered images corresponds to the input cameras in each rank, and allows for gradients to flow back to the Gaussians living in other ranks. For the details, please refer to the paper On Scaling Up 3D Gaussian Splatting Training.

Note

Batch Rasterization: This function allows for rasterizing a set of 3D Gaussians to a batch of images in one go, by simplly providing the batched viewmats and Ks.

Note

Support N-D Features: If sh_degree is None, the colors is expected to be with shape [N, D] or [C, N, D], in which D is the channel of the features to be rendered. The computation is slow when D > 32 at the moment. If sh_degree is set, the colors is expected to be the SH coefficients with shape [N, K, 3] or [C, N, K, 3], where K is the number of SH bases. In this case, it is expected that \((\textit{sh_degree} + 1) ^ 2 \leq K\), where sh_degree controls the activated bases in the SH coefficients.

Note

Depth Rendering: This function supports colors or/and depths via render_mode. The supported modes are “RGB”, “D”, “ED”, “RGB+D”, and “RGB+ED”. “RGB” renders the colored image that respects the colors argument. “D” renders the accumulated z-depth \(\sum_i w_i z_i\). “ED” renders the expected z-depth \(\frac{\sum_i w_i z_i}{\sum_i w_i}\). “RGB+D” and “RGB+ED” render both the colored image and the depth, in which the depth is the last channel of the output.

Note

Memory-Speed Trade-off: The packed argument provides a trade-off between memory footprint and runtime. If packed is True, the intermediate results are packed into sparse tensors, which is more memory efficient but might be slightly slower. This is especially helpful when the scene is large and each camera sees only a small portion of the scene. If packed is False, the intermediate results are with shape [C, N, …], which is faster but might consume more memory.

Note

Sparse Gradients: If sparse_grad is True, the gradients for {means, quats, scales} will be stored in a COO sparse layout. This can be helpful for saving memory for training when the scene is large and each iteration only activates a small portion of the Gaussians. Usually a sparse optimizer is required to work with sparse gradients, such as torch.optim.SparseAdam. This argument is only effective when packed is True.

Note

Speed-up for Large Scenes: The radius_clip argument is extremely helpful for speeding up large scale scenes or scenes with large depth of fields. Gaussians with 2D radius smaller or equal than this value (in pixel unit) will be skipped during rasterization. This will skip all the far-away Gaussians that are too small to be seen in the image. But be warned that if there are close-up Gaussians that are also below this threshold, they will also get skipped (which is rarely happened in practice). This is by default disabled by setting radius_clip to 0.0.

Note

Antialiased Rendering: If rasterize_mode is “antialiased”, the function will apply a view-dependent compensation factor \(\rho=\sqrt{\frac{Det(\Sigma)}{Det(\Sigma+ \epsilon I)}}\) to Gaussian opacities, where \(\Sigma\) is the projected 2D covariance matrix and \(\epsilon\) is the eps2d. This will make the rendered image more antialiased, as proposed in the paper Mip-Splatting: Alias-free 3D Gaussian Splatting.

Note

AbsGrad: If absgrad is True, the absolute gradients of the projected 2D means will be computed during the backward pass, which could be accessed by meta[“means2d”].absgrad. This is an implementation of the paper AbsGS: Recovering Fine Details for 3D Gaussian Splatting, which is shown to be more effective for splitting Gaussians during training.

Warning

This function is currently not differentiable w.r.t. the camera intrinsics Ks.

Parameters:
  • means – The 3D centers of the Gaussians. [N, 3]

  • quats – The quaternions of the Gaussians (wxyz convension). It’s not required to be normalized. [N, 4]

  • scales – The scales of the Gaussians. [N, 3]

  • opacities – The opacities of the Gaussians. [N]

  • colors – The colors of the Gaussians. [(C,) N, D] or [(C,) N, K, 3] for SH coefficients.

  • viewmats – The world-to-cam transformation of the cameras. [C, 4, 4]

  • Ks – The camera intrinsics. [C, 3, 3]

  • width – The width of the image.

  • height – The height of the image.

  • near_plane – The near plane for clipping. Default is 0.01.

  • far_plane – The far plane for clipping. Default is 1e10.

  • radius_clip – Gaussians with 2D radius smaller or equal than this value will be skipped. This is extremely helpful for speeding up large scale scenes. Default is 0.0.

  • eps2d – An epsilon added to the egienvalues of projected 2D covariance matrices. This will prevents the projected GS to be too small. For example eps2d=0.3 leads to minimal 3 pixel unit. Default is 0.3.

  • sh_degree – The SH degree to use, which can be smaller than the total number of bands. If set, the colors should be [(C,) N, K, 3] SH coefficients, else the colors should [(C,) N, D] post-activation color values. Default is None.

  • packed – Whether to use packed mode which is more memory efficient but might or might not be as fast. Default is True.

  • tile_size – The size of the tiles for rasterization. Default is 16. (Note: other values are not tested)

  • backgrounds – The background colors. [C, D]. Default is None.

  • render_mode – The rendering mode. Supported modes are “RGB”, “D”, “ED”, “RGB+D”, and “RGB+ED”. “RGB” renders the colored image, “D” renders the accumulated depth, and “ED” renders the expected depth. Default is “RGB”.

  • sparse_grad – If true, the gradients for {means, quats, scales} will be stored in a COO sparse layout. This can be helpful for saving memory. Default is False.

  • absgrad – If true, the absolute gradients of the projected 2D means will be computed during the backward pass, which could be accessed by meta[“means2d”].absgrad. Default is False.

  • rasterize_mode – The rasterization mode. Supported modes are “classic” and “antialiased”. Default is “classic”.

  • channel_chunk – The number of channels to render in one go. Default is 32. If the required rendering channels are larger than this value, the rendering will be done looply in chunks.

  • distributed – Whether to use distributed rendering. Default is False. If True, The input Gaussians are expected to be a subset of scene in each rank, and the function will collaboratively render the images for all ranks.

  • camera_model – The camera model to use. Supported models are “pinhole”, “ortho”, and “fisheye”. Default is “pinhole”.

  • covars – Optional covariance matrices of the Gaussians. If provided, the quats and scales will be ignored. [N, 3, 3], Default is None.

Returns:

render_colors: The rendered colors. [C, height, width, X]. X depends on the render_mode and input colors. If render_mode is “RGB”, X is D; if render_mode is “D” or “ED”, X is 1; if render_mode is “RGB+D” or “RGB+ED”, X is D+1.

render_alphas: The rendered alphas. [C, height, width, 1].

meta: A dictionary of intermediate results of the rasterization.

Return type:

A tuple

Examples:

>>> # define Gaussians
>>> means = torch.randn((100, 3), device=device)
>>> quats = torch.randn((100, 4), device=device)
>>> scales = torch.rand((100, 3), device=device) * 0.1
>>> colors = torch.rand((100, 3), device=device)
>>> opacities = torch.rand((100,), device=device)
>>> # define cameras
>>> viewmats = torch.eye(4, device=device)[None, :, :]
>>> Ks = torch.tensor([
>>>    [300., 0., 150.], [0., 300., 100.], [0., 0., 1.]], device=device)[None, :, :]
>>> width, height = 300, 200
>>> # render
>>> colors, alphas, meta = rasterization(
>>>    means, quats, scales, opacities, colors, viewmats, Ks, width, height
>>> )
>>> print (colors.shape, alphas.shape)
torch.Size([1, 200, 300, 3]) torch.Size([1, 200, 300, 1])
>>> print (meta.keys())
dict_keys(['camera_ids', 'gaussian_ids', 'radii', 'means2d', 'depths', 'conics',
'opacities', 'tile_width', 'tile_height', 'tiles_per_gauss', 'isect_ids',
'flatten_ids', 'isect_offsets', 'width', 'height', 'tile_size'])

2DGS

Given a set of 2D gaussians parametrized by means \(\mu \in \mathbb{R}^3\), two principal tangent vectors embedded as the first two columns of a rotation matrix \(R \in \mathbb{R}^{3\times3}\), and a scale matrix \(S \in R^{3\times3}\) representing the scaling along the two principal tangential directions, we first transforms pixels into splats’ local tangent frame by \((WH)^{-1} \in \mathbb{R}^{4\times4}\) and compute weights via ray-splat intersection. Then we follow the sort and rendering similar to 3DGS.

Note that H is the transformation from splat’s local tangent plane \(\{u, v\}\) into world space

\[\begin{split}H = \begin{bmatrix} RS & \mu \\ 0 & 1 \end{bmatrix}\end{split}\]

and \(W \in \mathbb{R}^{4\times4}\) is the transformation matrix from world space to image space.

Splatting is done via ray-splat plane intersection. Each pixel is considered as a x-plane \(h_{x}=(-1, 0, 0, x)^{T}\) and a y-plane \(h_{y}=(0, -1, 0, y)^{T}\), and the intersection between a splat and the pixel \(p=(x, y)\) is defined as the intersection bwtween x-plane, y-plane, and the splat’s tangent plane. We first transform \(h_{x}\) to \(h_{u}\) and \(h_{y}\) to \(h_{v}\) in splat’s tangent frame via the inverse transformation \((WH)^{-1}\). As the intersection point should fall on \(h_{u}\) and \(h_{v}\), we have an efficient solution:

\[u(p) = \frac{h^{2}_{u}h^{4}_{v}-h^{4}_{u}h^{2}_{v}}{h^{1}_{u}h^{2}_{v}-h^{2}_{u}h^{1}_{v}}, v(p) = \frac{h^{4}_{u}h^{1}_{v}-h^{1}_{u}h^{4}_{v}}{h^{1}_{u}h^{2}_{v}-h^{2}_{u}h^{1}_{v}}\]
rasterization_2dgs(means: Tensor, quats: Tensor, scales: Tensor, opacities: Tensor, colors: Tensor, viewmats: Tensor, Ks: Tensor, width: int, height: int, near_plane: float = 0.01, far_plane: float = 10000000000.0, radius_clip: float = 0.0, eps2d: float = 0.3, sh_degree: int | None = None, packed: bool = False, tile_size: int = 16, backgrounds: Tensor | None = None, render_mode: typing_extensions.Literal[RGB, D, ED, RGB + D, RGB + ED] = 'RGB', sparse_grad: bool = False, absgrad: bool = False, distloss: bool = False, depth_mode: typing_extensions.Literal[expected, median] = 'expected') Tuple[Tensor, Tensor, Tensor, Tensor, Tensor, Tensor, Dict][source]

Rasterize a set of 2D Gaussians (N) to a batch of image planes (C).

This function supports a handful of features, similar to the rasterization() function.

Warning

This function is currently not differentiable w.r.t. the camera intrinsics Ks.

Parameters:
  • means – The 3D centers of the Gaussians. [N, 3]

  • quats – The quaternions of the Gaussians (wxyz convension). It’s not required to be normalized. [N, 4]

  • scales – The scales of the Gaussians. [N, 3]

  • opacities – The opacities of the Gaussians. [N]

  • colors – The colors of the Gaussians. [(C,) N, D] or [(C,) N, K, 3] for SH coefficients.

  • viewmats – The world-to-cam transformation of the cameras. [C, 4, 4]

  • Ks – The camera intrinsics. [C, 3, 3]

  • width – The width of the image.

  • height – The height of the image.

  • near_plane – The near plane for clipping. Default is 0.01.

  • far_plane – The far plane for clipping. Default is 1e10.

  • radius_clip – Gaussians with 2D radius smaller or equal than this value will be skipped. This is extremely helpful for speeding up large scale scenes. Default is 0.0.

  • eps2d – An epsilon added to the egienvalues of projected 2D covariance matrices. This will prevents the projected GS to be too small. For example eps2d=0.3 leads to minimal 3 pixel unit. Default is 0.3.

  • sh_degree – The SH degree to use, which can be smaller than the total number of bands. If set, the colors should be [(C,) N, K, 3] SH coefficients, else the colors should [(C,) N, D] post-activation color values. Default is None.

  • packed – Whether to use packed mode which is more memory efficient but might or might not be as fast. Default is True.

  • tile_size – The size of the tiles for rasterization. Default is 16. (Note: other values are not tested)

  • backgrounds – The background colors. [C, D]. Default is None.

  • render_mode – The rendering mode. Supported modes are “RGB”, “D”, “ED”, “RGB+D”, and “RGB+ED”. “RGB” renders the colored image, “D” renders the accumulated depth, and “ED” renders the expected depth. Default is “RGB”.

  • sparse_grad (Experimental) – If true, the gradients for {means, quats, scales} will be stored in a COO sparse layout. This can be helpful for saving memory. Default is False.

  • absgrad – If true, the absolute gradients of the projected 2D means will be computed during the backward pass, which could be accessed by meta[“means2d”].absgrad. Default is False.

  • channel_chunk – The number of channels to render in one go. Default is 32. If the required rendering channels are larger than this value, the rendering will be done looply in chunks.

  • distloss – If true, use distortion regularization to get better geometry detail.

  • depth_mode – render depth mode. Choose from expected depth and median depth.

Returns:

render_colors: The rendered colors. [C, height, width, X]. X depends on the render_mode and input colors. If render_mode is “RGB”, X is D; if render_mode is “D” or “ED”, X is 1; if render_mode is “RGB+D” or “RGB+ED”, X is D+1.

render_alphas: The rendered alphas. [C, height, width, 1].

render_normals: The rendered normals. [C, height, width, 3].

surf_normals: surface normal from depth. [C, height, width, 3]

render_distort: The rendered distortions. [C, height, width, 1]. L1 version, different from L2 version in 2DGS paper.

render_median: The rendered median depth. [C, height, width, 1].

meta: A dictionary of intermediate results of the rasterization.

Return type:

A tuple

Examples:

>>> # define Gaussians
>>> means = torch.randn((100, 3), device=device)
>>> quats = torch.randn((100, 4), device=device)
>>> scales = torch.rand((100, 3), device=device) * 0.1
>>> colors = torch.rand((100, 3), device=device)
>>> opacities = torch.rand((100,), device=device)
>>> # define cameras
>>> viewmats = torch.eye(4, device=device)[None, :, :]
>>> Ks = torch.tensor([
>>>    [300., 0., 150.], [0., 300., 100.], [0., 0., 1.]], device=device)[None, :, :]
>>> width, height = 300, 200
>>> # render
>>> colors, alphas, normals, surf_normals, distort, median_depth, meta = rasterization_2dgs(
>>>    means, quats, scales, opacities, colors, viewmats, Ks, width, height
>>> )
>>> print (colors.shape, alphas.shape)
torch.Size([1, 200, 300, 3]) torch.Size([1, 200, 300, 1])
>>> print (normals.shape, surf_normals.shape)
torch.Size([1, 200, 300, 3]) torch.Size([1, 200, 300, 3])
>>> print (distort.shape, median_depth.shape)
torch.Size([1, 200, 300, 1]) torch.Size([1, 200, 300, 1])
>>> print (meta.keys())
dict_keys(['camera_ids', 'gaussian_ids', 'radii', 'means2d', 'depths', 'ray_transforms',
'opacities', 'normals', 'tile_width', 'tile_height', 'tiles_per_gauss', 'isect_ids',
'flatten_ids', 'isect_offsets', 'width', 'height', 'tile_size', 'n_cameras', 'render_distort',
'gradient_2dgs'])