gsplat on DGX Spark
gsplat is a CUDA-accelerated differentiable rasterizer for 3D Gaussian Splatting. This page explains what it does, how the pipeline works, how to use the API, and how to run it reliably on DGX Spark.
Overview
gsplat is the rasterization engine that sits at the heart of 3D Gaussian Splatting. When you train a scene with nerfstudio or the original 3DGS paper code, gsplat is what actually renders the Gaussians into pixels — thousands of times per second during training, and in real-time during inference.
Takes a set of 3D Gaussian ellipsoids — each with a position, size, rotation, color, and opacity — and renders them into a 2D image from any camera viewpoint.
It's differentiable — gradients flow back through the rasterizer to the Gaussian parameters. This is what makes training possible. You can optimize the scene from 2D photo supervision.
Up to 4x less training memory footprint and 15% less training time on Mip-NeRF 360 captures compared to the official INRIA code. Significantly faster for large scene rendering.
UC Berkeley, NVIDIA, Meta, LumaAI, CMU, ShanghaiTech, Amazon. Born from nerfstudio team curiosity about 3DGS. Now the de-facto standard rasterizer.
Where gsplat fits
How 3DGS rasterization works
The rasterization pipeline has four main stages. Each maps clearly to concepts from traditional real-time rendering.
Represent the scene as 3D Gaussians
Each Gaussian is defined by:
means μ ∈ ℝ³— 3D position in world spacescales s ∈ ℝ³— size along each axis (forms the covariance)quaternion q ∈ ℝ⁴— rotation of the ellipsoidopacity o ∈ [0,1]— how transparentcolor c— spherical harmonic coefficients for view-dependent color
The covariance matrix is reconstructed as Σ = R·S·Sᵀ·Rᵀ where R is from the quaternion and S is diagonal from scales.
Project to 2D image plane
Each 3D Gaussian is projected onto the camera's image plane using the Jacobian of the perspective projection:
J = [fx/z, 0, -fx·x/z²]
[0, fy/z, -fy·y/z²]
[0, 0, 0]
Σ' = J · W · Σ · Wᵀ · Jᵀ Where W is the world-to-camera transform and fx, fy are focal lengths. This gives a 2D Gaussian "splat" on screen.
For a rendering engineer: this is the same kind of projection math used in any rasterizer — the twist is that instead of triangles, you're projecting ellipsoids.
Tile-based sorting
The screen is divided into 16×16 pixel tiles. For each tile, all Gaussians that overlap it are collected and sorted by depth (front to back). This is what makes the renderer GPU-friendly — tiles map naturally to thread blocks.
For rendering engineers: this is conceptually close to tiled deferred techniques and tile-based transparency workflows in real-time engines.
Alpha compositing per pixel
Within each tile, Gaussians are alpha-composited front-to-back using the standard over operator:
C = Σᵢ cᵢ · αᵢ · Πⱼ<ᵢ (1 - αⱼ)
where αᵢ = oᵢ · exp(-0.5 · (x-μᵢ)ᵀ Σ'⁻¹ (x-μᵢ)) For rendering engineers: this is the same order-dependent transparency problem as OIT. 3DGS handles it primarily through per-tile depth sorting.
Python API
gsplat exposes a clean Python API. The main function is rasterization() — everything else builds on top of it.
Core rasterization call
from gsplat import rasterization
renders, alphas, info = rasterization(
means, # [N, 3] Gaussian centers in world space
quats, # [N, 4] rotation quaternions (normalized)
scales, # [N, 3] scale per axis
opacities, # [N] opacity values 0-1
colors, # [N, D] per-Gaussian colors or SH coefficients
viewmats, # [C, 4, 4] world-to-camera transforms
Ks, # [C, 3, 3] camera intrinsics
width, # int image width in pixels
height, # int image height in pixels
) What comes back
renders [C, H, W, D] Rendered images. C cameras, H×W pixels, D channels.
alphas [C, H, W, 1] Accumulated alpha (opacity) per pixel. Useful for compositing.
info dict Auxiliary data — per-Gaussian visibility, tile stats, useful for densification strategies.
Minimal working example
import torch
from gsplat import rasterization
N = 10000 # number of Gaussians
C = 1 # number of cameras
H, W = 480, 640
# Random scene (replace with loaded .ply data)
means = torch.randn(N, 3, device="cuda")
quats = torch.randn(N, 4, device="cuda")
quats = quats / quats.norm(dim=-1, keepdim=True) # normalize
scales = torch.rand(N, 3, device="cuda") * 0.1
opacities = torch.rand(N, device="cuda")
colors = torch.rand(N, 3, device="cuda")
# Identity camera looking down -Z
viewmats = torch.eye(4, device="cuda").unsqueeze(0) # [1, 4, 4]
Ks = torch.tensor([[[W, 0, W/2],
[0, H, H/2],
[0, 0, 1]]], device="cuda", dtype=torch.float32)
renders, alphas, info = rasterization(
means, quats, scales, opacities, colors,
viewmats, Ks, width=W, height=H
)
print("Rendered shape:", renders.shape) # [1, 480, 640, 3]
print("Done.") Loading a .ply scene file
import numpy as np
from plyfile import PlyData
def load_ply(path):
ply = PlyData.read(path)
v = ply['vertex']
means = torch.tensor(
np.stack([v['x'], v['y'], v['z']], axis=1),
dtype=torch.float32, device='cuda'
)
scales = torch.tensor(
np.exp(np.stack([v['scale_0'], v['scale_1'], v['scale_2']], axis=1)),
dtype=torch.float32, device='cuda'
)
quats = torch.tensor(
np.stack([v['rot_0'], v['rot_1'], v['rot_2'], v['rot_3']], axis=1),
dtype=torch.float32, device='cuda'
)
opacities = torch.tensor(
1 / (1 + np.exp(-v['opacity'])), # sigmoid
dtype=torch.float32, device='cuda'
)
return means, scales, quats, opacities
means, scales, quats, opacities = load_ply('/workspace/garden.ply')
print("Loaded", means.shape[0], "Gaussians") gsplat on DGX Spark
On ARM64 + CUDA 13, install from source in the NVIDIA PyTorch container for consistent results.
The problem
pip install gsplat No pre-compiled ARM64 + CUDA 13 wheel on PyPI TORCH_CUDA_ARCH_LIST="12.1" PyTorch 2.7 in NVIDIA container doesn't know arch 12.1 — use 12.0 pip install gsplat without --no-build-isolation pip tries to install conflicting PyTorch, overwriting NVIDIA's custom build Working install — verified Apr 2026
# Inside nvcr.io/nvidia/pytorch:25.03-py3 container
# DGX Spark · GB10 · CUDA 13.0 · ARM64
export TORCH_CUDA_ARCH_LIST="12.0"
pip install git+https://github.com/nerfstudio-project/gsplat.git \
--no-build-isolation
# Verify
python3 -c "import gsplat; print('gsplat OK', gsplat.__version__)"
# → gsplat OK 1.5.3 Save as container image
exit
docker commit gsplat-build vertexnova/gsplat-spark:v1
docker images | grep vertexnova Quick GPU sanity check inside container
python3 -c "
import torch
import gsplat
print('PyTorch:', torch.__version__)
print('CUDA available:', torch.cuda.is_available())
print('GPU:', torch.cuda.get_device_name(0))
print('gsplat:', gsplat.__version__)
# Quick rasterization smoke test
N = 1000
means = torch.randn(N, 3, device='cuda')
quats = torch.randn(N, 4, device='cuda')
quats = quats / quats.norm(dim=-1, keepdim=True)
scales = torch.rand(N, 3, device='cuda') * 0.05
opacities = torch.rand(N, device='cuda')
colors = torch.rand(N, 3, device='cuda')
viewmats = torch.eye(4, device='cuda').unsqueeze(0)
Ks = torch.tensor([[[640,0,320],[0,480,240],[0,0,1]]],
device='cuda', dtype=torch.float32)
renders, alphas, _ = gsplat.rasterization(
means, quats, scales, opacities, colors,
viewmats, Ks, width=640, height=480
)
print('Render shape:', renders.shape)
print('ALL OK')
" Integration plan for VertexNova
Phased plan for moving from validated Python usage to native engine integration.
gsplat installed on DGX Spark
gsplat 1.5.3 running on GB10 Blackwell, ARM64, CUDA 13. Saved as vertexnova/gsplat-spark:v1.
First scene rendering
Download garden.ply pre-trained scene. Render from a fixed camera using gsplat.rasterization(). Save as PNG. Understand what the output looks like.
vne3dgs module in VertexNova
New module in VertexNova that loads a .ply file, parses Gaussian attributes, and passes them to a render pass. Start with OpenGL, port to Vulkan later.
Vulkan 3DGS rasterizer
Implement tile-based Gaussian rasterization in Vulkan compute shaders. Replace the Python/CUDA path with a native C++ implementation inside VertexNova's RHI.
Surgical scene reconstruction
Capture surgical anatomy from multiple views → COLMAP → gsplat training → real-time Vulkan render. Novel combination of OR rendering expertise + 3DGS.