Research · 3DGS

gsplat on DGX Spark

gsplat is a CUDA-accelerated differentiable rasterizer for 3D Gaussian Splatting. This page explains what it does, how the pipeline works, how to use the API, and how to run it reliably on DGX Spark.

gsplat 1.5.3 PyTorch 2.7 CUDA 13 ARM64 memory-efficient rasterizer

Overview

gsplat is the rasterization engine that sits at the heart of 3D Gaussian Splatting. When you train a scene with nerfstudio or the original 3DGS paper code, gsplat is what actually renders the Gaussians into pixels — thousands of times per second during training, and in real-time during inference.

What it does

Takes a set of 3D Gaussian ellipsoids — each with a position, size, rotation, color, and opacity — and renders them into a 2D image from any camera viewpoint.

Why it matters

It's differentiable — gradients flow back through the rasterizer to the Gaussian parameters. This is what makes training possible. You can optimize the scene from 2D photo supervision.

vs. original implementation

Up to 4x less training memory footprint and 15% less training time on Mip-NeRF 360 captures compared to the official INRIA code. Significantly faster for large scene rendering.

Who develops it

UC Berkeley, NVIDIA, Meta, LumaAI, CMU, ShanghaiTech, Amazon. Born from nerfstudio team curiosity about 3DGS. Now the de-facto standard rasterizer.

Where gsplat fits

Your application / VertexNova
↑ rendered images / ↓ camera poses
nerfstudio
original 3DGS (INRIA)
↑ gradients / ↓ Gaussian params
gsplat — rasterizer
↑ / ↓
CUDA kernels on GPU
gsplat is the bridge between Gaussian scene parameters and rendered pixels during both training and inference.

How 3DGS rasterization works

The rasterization pipeline has four main stages. Each maps clearly to concepts from traditional real-time rendering.

01

Represent the scene as 3D Gaussians

Each Gaussian is defined by:

  • means μ ∈ ℝ³ — 3D position in world space
  • scales s ∈ ℝ³ — size along each axis (forms the covariance)
  • quaternion q ∈ ℝ⁴ — rotation of the ellipsoid
  • opacity o ∈ [0,1] — how transparent
  • color c — spherical harmonic coefficients for view-dependent color

The covariance matrix is reconstructed as Σ = R·S·Sᵀ·Rᵀ where R is from the quaternion and S is diagonal from scales.

02

Project to 2D image plane

Each 3D Gaussian is projected onto the camera's image plane using the Jacobian of the perspective projection:

J = [fx/z,    0,  -fx·x/z²]
    [0,    fy/z, -fy·y/z²]
    [0,       0,         0]

Σ' = J · W · Σ · Wᵀ · Jᵀ

Where W is the world-to-camera transform and fx, fy are focal lengths. This gives a 2D Gaussian "splat" on screen.

For a rendering engineer: this is the same kind of projection math used in any rasterizer — the twist is that instead of triangles, you're projecting ellipsoids.

03

Tile-based sorting

The screen is divided into 16×16 pixel tiles. For each tile, all Gaussians that overlap it are collected and sorted by depth (front to back). This is what makes the renderer GPU-friendly — tiles map naturally to thread blocks.

For rendering engineers: this is conceptually close to tiled deferred techniques and tile-based transparency workflows in real-time engines.

04

Alpha compositing per pixel

Within each tile, Gaussians are alpha-composited front-to-back using the standard over operator:

C = Σᵢ cᵢ · αᵢ · Πⱼ<ᵢ (1 - αⱼ)

where αᵢ = oᵢ · exp(-0.5 · (x-μᵢ)ᵀ Σ'⁻¹ (x-μᵢ))

For rendering engineers: this is the same order-dependent transparency problem as OIT. 3DGS handles it primarily through per-tile depth sorting.

3DGS can be viewed as a scene-wide transparency problem solved with explicit sorting plus Gaussian footprint evaluation.

Python API

gsplat exposes a clean Python API. The main function is rasterization() — everything else builds on top of it.

Core rasterization call

from gsplat import rasterization

renders, alphas, info = rasterization(
    means,        # [N, 3]  Gaussian centers in world space
    quats,        # [N, 4]  rotation quaternions (normalized)
    scales,       # [N, 3]  scale per axis
    opacities,    # [N]     opacity values 0-1
    colors,       # [N, D]  per-Gaussian colors or SH coefficients
    viewmats,     # [C, 4, 4]  world-to-camera transforms
    Ks,           # [C, 3, 3]  camera intrinsics
    width,        # int    image width in pixels
    height,       # int    image height in pixels
)

What comes back

renders [C, H, W, D]

Rendered images. C cameras, H×W pixels, D channels.

alphas [C, H, W, 1]

Accumulated alpha (opacity) per pixel. Useful for compositing.

info dict

Auxiliary data — per-Gaussian visibility, tile stats, useful for densification strategies.

Minimal working example

import torch
from gsplat import rasterization

N = 10000   # number of Gaussians
C = 1       # number of cameras
H, W = 480, 640

# Random scene (replace with loaded .ply data)
means    = torch.randn(N, 3, device="cuda")
quats    = torch.randn(N, 4, device="cuda")
quats    = quats / quats.norm(dim=-1, keepdim=True)  # normalize
scales   = torch.rand(N, 3, device="cuda") * 0.1
opacities = torch.rand(N, device="cuda")
colors   = torch.rand(N, 3, device="cuda")

# Identity camera looking down -Z
viewmats = torch.eye(4, device="cuda").unsqueeze(0)  # [1, 4, 4]
Ks = torch.tensor([[[W, 0, W/2],
                    [0, H, H/2],
                    [0, 0,   1]]], device="cuda", dtype=torch.float32)

renders, alphas, info = rasterization(
    means, quats, scales, opacities, colors,
    viewmats, Ks, width=W, height=H
)

print("Rendered shape:", renders.shape)  # [1, 480, 640, 3]
print("Done.")

Loading a .ply scene file

import numpy as np
from plyfile import PlyData

def load_ply(path):
    ply = PlyData.read(path)
    v = ply['vertex']

    means = torch.tensor(
        np.stack([v['x'], v['y'], v['z']], axis=1),
        dtype=torch.float32, device='cuda'
    )
    scales = torch.tensor(
        np.exp(np.stack([v['scale_0'], v['scale_1'], v['scale_2']], axis=1)),
        dtype=torch.float32, device='cuda'
    )
    quats = torch.tensor(
        np.stack([v['rot_0'], v['rot_1'], v['rot_2'], v['rot_3']], axis=1),
        dtype=torch.float32, device='cuda'
    )
    opacities = torch.tensor(
        1 / (1 + np.exp(-v['opacity'])),  # sigmoid
        dtype=torch.float32, device='cuda'
    )
    return means, scales, quats, opacities

means, scales, quats, opacities = load_ply('/workspace/garden.ply')
print("Loaded", means.shape[0], "Gaussians")
ℹ The .ply file stores scales as log values and opacities as logit values — remember to exp() and sigmoid() them on load.

gsplat on DGX Spark

On ARM64 + CUDA 13, install from source in the NVIDIA PyTorch container for consistent results.

The problem

fails pip install gsplat No pre-compiled ARM64 + CUDA 13 wheel on PyPI
fails TORCH_CUDA_ARCH_LIST="12.1" PyTorch 2.7 in NVIDIA container doesn't know arch 12.1 — use 12.0
fails pip install gsplat without --no-build-isolation pip tries to install conflicting PyTorch, overwriting NVIDIA's custom build

Working install — verified Apr 2026

# Inside nvcr.io/nvidia/pytorch:25.03-py3 container
# DGX Spark · GB10 · CUDA 13.0 · ARM64

export TORCH_CUDA_ARCH_LIST="12.0"

pip install git+https://github.com/nerfstudio-project/gsplat.git \
  --no-build-isolation

# Verify
python3 -c "import gsplat; print('gsplat OK', gsplat.__version__)"
# → gsplat OK 1.5.3

Save as container image

exit
docker commit gsplat-build vertexnova/gsplat-spark:v1
docker images | grep vertexnova

Quick GPU sanity check inside container

python3 -c "
import torch
import gsplat

print('PyTorch:', torch.__version__)
print('CUDA available:', torch.cuda.is_available())
print('GPU:', torch.cuda.get_device_name(0))
print('gsplat:', gsplat.__version__)

# Quick rasterization smoke test
N = 1000
means = torch.randn(N, 3, device='cuda')
quats = torch.randn(N, 4, device='cuda')
quats = quats / quats.norm(dim=-1, keepdim=True)
scales = torch.rand(N, 3, device='cuda') * 0.05
opacities = torch.rand(N, device='cuda')
colors = torch.rand(N, 3, device='cuda')
viewmats = torch.eye(4, device='cuda').unsqueeze(0)
Ks = torch.tensor([[[640,0,320],[0,480,240],[0,0,1]]],
                   device='cuda', dtype=torch.float32)

renders, alphas, _ = gsplat.rasterization(
    means, quats, scales, opacities, colors,
    viewmats, Ks, width=640, height=480
)
print('Render shape:', renders.shape)
print('ALL OK')
"