Research · 3DGS

gsplat on DGX Spark

gsplat is a CUDA-accelerated differentiable rasterizer for 3D Gaussian Splatting. This page explains what it does, how the pipeline works, how to use the API, and how to run it reliably on DGX Spark.

gsplat 1.5.3 PyTorch 2.7 CUDA 13 ARM64 memory-efficient rasterizer

Overview

gsplat is the rasterization engine that sits at the heart of 3D Gaussian Splatting. When you train a scene with nerfstudio or the original 3DGS paper code, gsplat is what actually renders the Gaussians into pixels — thousands of times per second during training, and in real-time during inference.

What it does

Takes a set of 3D Gaussian ellipsoids — each with a position, size, rotation, color, and opacity — and renders them into a 2D image from any camera viewpoint.

Why it matters

It's differentiable — gradients flow back through the rasterizer to the Gaussian parameters. This is what makes training possible. You can optimize the scene from 2D photo supervision.

vs. original implementation

Up to 4x less training memory footprint and 15% less training time on Mip-NeRF 360 captures compared to the official INRIA code. Significantly faster for large scene rendering.

Who develops it

UC Berkeley, NVIDIA, Meta, LumaAI, CMU, ShanghaiTech, Amazon. Born from nerfstudio team curiosity about 3DGS. Now the de-facto standard rasterizer.

Where gsplat fits

Your application / VertexNova

↑ rendered images / ↓ camera poses

nerfstudio

original 3DGS (INRIA)

↑ gradients / ↓ Gaussian params

gsplat — rasterizer

↑ / ↓

CUDA kernels on GPU

gsplat is the bridge between Gaussian scene parameters and rendered pixels during both training and inference.

How 3DGS rasterization works

The rasterization pipeline has four main stages. Each maps clearly to concepts from traditional real-time rendering.

Represent the scene as 3D Gaussians

Each Gaussian is defined by:

means μ ∈ ℝ³ — 3D position in world space
scales s ∈ ℝ³ — size along each axis (forms the covariance)
quaternion q ∈ ℝ⁴ — rotation of the ellipsoid
opacity o ∈ [0,1] — how transparent
color c — spherical harmonic coefficients for view-dependent color

The covariance matrix is reconstructed as Σ = R·S·Sᵀ·Rᵀ where R is from the quaternion and S is diagonal from scales.

Project to 2D image plane

Each 3D Gaussian is projected onto the camera's image plane using the Jacobian of the perspective projection:

J = [fx/z,    0,  -fx·x/z²]
    [0,    fy/z, -fy·y/z²]
    [0,       0,         0]

Σ' = J · W · Σ · Wᵀ · Jᵀ

Where W is the world-to-camera transform and fx, fy are focal lengths. This gives a 2D Gaussian "splat" on screen.

For a rendering engineer: this is the same kind of projection math used in any rasterizer — the twist is that instead of triangles, you're projecting ellipsoids.

Tile-based sorting

The screen is divided into 16×16 pixel tiles. For each tile, all Gaussians that overlap it are collected and sorted by depth (front to back). This is what makes the renderer GPU-friendly — tiles map naturally to thread blocks.

For rendering engineers: this is conceptually close to tiled deferred techniques and tile-based transparency workflows in real-time engines.

Alpha compositing per pixel

Within each tile, Gaussians are alpha-composited front-to-back using the standard over operator:

C = Σᵢ cᵢ · αᵢ · Πⱼ＜ᵢ (1 - αⱼ)

where αᵢ = oᵢ · exp(-0.5 · (x-μᵢ)ᵀ Σ'⁻¹ (x-μᵢ))

For rendering engineers: this is the same order-dependent transparency problem as OIT. 3DGS handles it primarily through per-tile depth sorting.

3DGS can be viewed as a scene-wide transparency problem solved with explicit sorting plus Gaussian footprint evaluation.

Python API

gsplat exposes a clean Python API. The main function is rasterization() — everything else builds on top of it.

Core rasterization call

from gsplat import rasterization

renders, alphas, info = rasterization(
    means,        # [N, 3]  Gaussian centers in world space
    quats,        # [N, 4]  rotation quaternions (normalized)
    scales,       # [N, 3]  scale per axis
    opacities,    # [N]     opacity values 0-1
    colors,       # [N, D]  per-Gaussian colors or SH coefficients
    viewmats,     # [C, 4, 4]  world-to-camera transforms
    Ks,           # [C, 3, 3]  camera intrinsics
    width,        # int    image width in pixels
    height,       # int    image height in pixels
)

What comes back

renders [C, H, W, D]

Rendered images. C cameras, H×W pixels, D channels.

alphas [C, H, W, 1]

Accumulated alpha (opacity) per pixel. Useful for compositing.

info dict

Auxiliary data — per-Gaussian visibility, tile stats, useful for densification strategies.

Minimal working example

import torch
from gsplat import rasterization

N = 10000   # number of Gaussians
C = 1       # number of cameras
H, W = 480, 640

# Random scene (replace with loaded .ply data)
means    = torch.randn(N, 3, device="cuda")
quats    = torch.randn(N, 4, device="cuda")
quats    = quats / quats.norm(dim=-1, keepdim=True)  # normalize
scales   = torch.rand(N, 3, device="cuda") * 0.1
opacities = torch.rand(N, device="cuda")
colors   = torch.rand(N, 3, device="cuda")

# Identity camera looking down -Z
viewmats = torch.eye(4, device="cuda").unsqueeze(0)  # [1, 4, 4]
Ks = torch.tensor([[[W, 0, W/2],
                    [0, H, H/2],
                    [0, 0,   1]]], device="cuda", dtype=torch.float32)

renders, alphas, info = rasterization(
    means, quats, scales, opacities, colors,
    viewmats, Ks, width=W, height=H
)

print("Rendered shape:", renders.shape)  # [1, 480, 640, 3]
print("Done.")

Loading a .ply scene file

import numpy as np
from plyfile import PlyData

def load_ply(path):
    ply = PlyData.read(path)
    v = ply['vertex']

    means = torch.tensor(
        np.stack([v['x'], v['y'], v['z']], axis=1),
        dtype=torch.float32, device='cuda'
    )
    scales = torch.tensor(
        np.exp(np.stack([v['scale_0'], v['scale_1'], v['scale_2']], axis=1)),
        dtype=torch.float32, device='cuda'
    )
    quats = torch.tensor(
        np.stack([v['rot_0'], v['rot_1'], v['rot_2'], v['rot_3']], axis=1),
        dtype=torch.float32, device='cuda'
    )
    opacities = torch.tensor(
        1 / (1 + np.exp(-v['opacity'])),  # sigmoid
        dtype=torch.float32, device='cuda'
    )
    return means, scales, quats, opacities

means, scales, quats, opacities = load_ply('/workspace/garden.ply')
print("Loaded", means.shape[0], "Gaussians")

ℹ The .ply file stores scales as log values and opacities as logit values — remember to exp() and sigmoid() them on load.

gsplat on DGX Spark

On ARM64 + CUDA 13, install from source in the NVIDIA PyTorch container for consistent results.

The problem

fails pip install gsplat No pre-compiled ARM64 + CUDA 13 wheel on PyPI

fails TORCH_CUDA_ARCH_LIST="12.1" PyTorch 2.7 in NVIDIA container doesn't know arch 12.1 — use 12.0

fails pip install gsplat without --no-build-isolation pip tries to install conflicting PyTorch, overwriting NVIDIA's custom build

Working install — verified Apr 2026

# Inside nvcr.io/nvidia/pytorch:25.03-py3 container
# DGX Spark · GB10 · CUDA 13.0 · ARM64

export TORCH_CUDA_ARCH_LIST="12.0"

pip install git+https://github.com/nerfstudio-project/gsplat.git \
  --no-build-isolation

# Verify
python3 -c "import gsplat; print('gsplat OK', gsplat.__version__)"
# → gsplat OK 1.5.3

Save as container image

exit
docker commit gsplat-build vertexnova/gsplat-spark:v1
docker images | grep vertexnova

Quick GPU sanity check inside container

python3 -c "
import torch
import gsplat

print('PyTorch:', torch.__version__)
print('CUDA available:', torch.cuda.is_available())
print('GPU:', torch.cuda.get_device_name(0))
print('gsplat:', gsplat.__version__)

# Quick rasterization smoke test
N = 1000
means = torch.randn(N, 3, device='cuda')
quats = torch.randn(N, 4, device='cuda')
quats = quats / quats.norm(dim=-1, keepdim=True)
scales = torch.rand(N, 3, device='cuda') * 0.05
opacities = torch.rand(N, device='cuda')
colors = torch.rand(N, 3, device='cuda')
viewmats = torch.eye(4, device='cuda').unsqueeze(0)
Ks = torch.tensor([[[640,0,320],[0,480,240],[0,0,1]]],
                   device='cuda', dtype=torch.float32)

renders, alphas, _ = gsplat.rasterization(
    means, quats, scales, opacities, colors,
    viewmats, Ks, width=640, height=480
)
print('Render shape:', renders.shape)
print('ALL OK')
"

Integration plan for VertexNova

Phased plan for moving from validated Python usage to native engine integration.

done

gsplat installed on DGX Spark

gsplat 1.5.3 running on GB10 Blackwell, ARM64, CUDA 13. Saved as vertexnova/gsplat-spark:v1.

active

First scene rendering

Download garden.ply pre-trained scene. Render from a fixed camera using gsplat.rasterization(). Save as PNG. Understand what the output looks like.

planned

vne3dgs module in VertexNova

New module in VertexNova that loads a .ply file, parses Gaussian attributes, and passes them to a render pass. Start with OpenGL, port to Vulkan later.

planned

Vulkan 3DGS rasterizer

Implement tile-based Gaussian rasterization in Vulkan compute shaders. Replace the Python/CUDA path with a native C++ implementation inside VertexNova's RHI.

planned

Surgical scene reconstruction

Capture surgical anatomy from multiple views → COLMAP → gsplat training → real-time Vulkan render. Novel combination of OR rendering expertise + 3DGS.

Key resources

gsplat docs ↗ Official documentation — API reference, examples, conventions gsplat GitHub ↗ Source code — CUDA kernels, Python bindings, examples Original 3DGS paper ↗ Kerbl et al., SIGGRAPH 2023 — the paper gsplat is based on nerfstudio docs ↗ Training pipeline that uses gsplat as its rasterization backend