Setup · Primary guide

DGX Containers

NVIDIA DGX Spark · GB10 · ARM64 · CUDA 13

Build vertexnova/gsplat-spark and vertexnova/aiml-spark from the same PyTorch base image. Work through the tabs in order: prerequisites once, then each image, verify day-to-day use, summary and limits last. For raw Docker commands, open Docker on Spark in another tab (same tab + copy pattern).

DGX OS 7.4.0 CUDA 13.0.2 Driver 580.142 ARM64 GB10 · SM 12.0 128 GB unified memory

Prerequisites — do once after first boot

1 Disable swap done once

DGX Spark uses unified memory — CPU and GPU share the same 128GB pool. With swap enabled, heavy GPU workloads trigger a death spiral: training fills memory → OS swaps to disk → machine freezes. Disabling swap turns a machine freeze into a clean job crash.

# Comment out swap in fstab
sudo nano /etc/fstab
# Change: /swap.img  none  swap  sw  0  0
# To:   # /swap.img  none  swap  sw  0  0

# Apply immediately
sudo swapoff -a
swapon --show    # should print nothing

2 Verify GPU access from Docker done once

docker run --rm --gpus all \
  nvcr.io/nvidia/cuda:13.0.0-base-ubuntu24.04 nvidia-smi

Expected output: NVIDIA-SMI 580.142 · Driver 580.142 · CUDA 13.0 · GB10

3 Pull base image done once

All containers build on NVIDIA's official PyTorch container — Blackwell-optimised, ARM64 native, CUDA 13, PyTorch 2.7.

docker pull nvcr.io/nvidia/pytorch:25.03-py3
# Size: 21.8GB — download once, reuse for all containers

4 Standard run flags always use

These flags are required for every container on DGX Spark. Without --ipc=host, PyTorch shared memory is capped at 64MB and will crash during training.

docker run --gpus all -it \
  --name CONTAINER_NAME \
  --ipc=host \                 # shared memory — required for PyTorch
  --ulimit memlock=-1 \        # no memory lock limit
  --ulimit stack=67108864 \    # 64MB stack
  -v ~/workspace:/workspace \  # persist files
  -p 7007:7007 \               # nerfstudio viewer
  IMAGE_NAME bash

5 CUDA arch for GB10 critical

The GB10 is compute capability SM 12.0. PyTorch 2.7 in the NVIDIA container does not recognise 12.1 — use 12.0 when building any CUDA extension from source.

export TORCH_CUDA_ARCH_LIST="12.0"
# Set this before: pip install gsplat, any CUDA extension build

Using 12.1 raises an unknown-arch error in this PyTorch build. Use 12.0 only.

vertexnova/gsplat-spark:v1

3D Gaussian Splatting workspace — gsplat 1.5.3, PyTorch 2.7, ARM64 native.

PyTorch 2.7.0a0 NVIDIA build · pre-installed

gsplat 1.5.3 built from source · SM 12.0

CUDA 13.0.2 pre-installed · SM 12.0

gdown 6.0.0 scene download utility

1 Launch base container

docker run --gpus all -it \
  --name gsplat-build \
  --ipc=host \
  --ulimit memlock=-1 \
  --ulimit stack=67108864 \
  -v ~/workspace:/workspace \
  -p 7007:7007 \
  nvcr.io/nvidia/pytorch:25.03-py3 bash

2 Verify GPU inside container

python3 -c "import torch; print(torch.cuda.get_device_name(0))"
# → NVIDIA GB10

3 Install gsplat from source

Must build from source — no pre-compiled ARM64 + CUDA 13 wheel exists on PyPI. The --no-build-isolation flag prevents pip from trying to install a conflicting PyTorch version.

export TORCH_CUDA_ARCH_LIST="12.0"

pip install git+https://github.com/nerfstudio-project/gsplat.git \
  --no-build-isolation

# Compilation takes 5-10 minutes — normal

Do not point pip at the public PyTorch cu130 index here; it conflicts with the NVIDIA container build.

4 Verify gsplat

python3 -c "import gsplat; print('gsplat OK', gsplat.__version__)"
# → gsplat OK 1.5.3

5 Exit and commit

exit

docker commit gsplat-build vertexnova/gsplat-spark:v1
docker images | grep vertexnova

vertexnova/aiml-spark:v1

CV, tabular work, light 3D mesh I/O, and JupyterLab on top of the same NVIDIA PyTorch base. NumPy, pandas, matplotlib, and scikit-learn already ship in the base image.

PyTorch 2.7.0a0 NVIDIA base image

OpenCV 4.10.0 headless · pip stage 1

JupyterLab latest pip stage 4

trimesh · plyfile 4.x / 1.x mesh + PLY I/O

scikit-image · imageio 0.26 / 2.37 pip stage 1

seaborn · rich · tqdm — pip stage 2

Not covered here: Open3D (no stable ARM64 wheel in this flow), TensorFlow on Python 3.12 (use a separate TF-focused image if needed).

1 Launch base container

docker run --gpus all -it \
  --name aiml-dev \
  --ipc=host \
  --ulimit memlock=-1 \
  --ulimit stack=67108864 \
  -v ~/workspace:/workspace \
  nvcr.io/nvidia/pytorch:25.03-py3 bash

2 Stage 1 — computer vision

pip install \
  opencv-python-headless \
  scikit-image \
  imageio \
  Pillow

3 Stage 2 — data science

pip install \
  seaborn \
  tqdm \
  rich
# pandas, matplotlib, scikit-learn, numpy already in base image

4 Stage 3 — 3D geometry

pip install trimesh plyfile
# open3d skipped — no ARM64 wheel available

5 Stage 4 — dev tools

pip install \
  jupyterlab \
  ipywidgets \
  gdown \
  python-dotenv \
  pyyaml \
  requests

6 Exit and commit

exit

docker commit aiml-dev vertexnova/aiml-spark:v1
docker images | grep vertexnova

Verify & launch containers

1 Health checks after every reboot

# 1. GPU driver OK
nvidia-smi

# 2. Docker GPU access OK
docker run --rm --gpus all \
  nvcr.io/nvidia/cuda:13.0.0-base-ubuntu24.04 nvidia-smi

# 3. Images intact
docker images | grep vertexnova

# 4. Optional — containers you expect to stay up
docker ps | grep YOUR_SERVICE

2 Launch gsplat container

docker run --gpus all -it \
  --name gsplat-dev \
  --ipc=host \
  --ulimit memlock=-1 \
  --ulimit stack=67108864 \
  -v ~/workspace:/workspace \
  -p 7007:7007 \
  vertexnova/gsplat-spark:v1 bash

# Inside — verify
python3 -c "import gsplat; print('gsplat', gsplat.__version__)"

3 Launch AI/ML container

docker run --gpus all -it \
  --name aiml-dev \
  --ipc=host \
  --ulimit memlock=-1 \
  --ulimit stack=67108864 \
  -v ~/workspace:/workspace \
  -p 8888:8888 \
  vertexnova/aiml-spark:v1 bash

# Inside — verify everything
python3 -c "
import torch, cv2, numpy, pandas, trimesh, plyfile
print('torch:', torch.__version__, '| GPU:', torch.cuda.get_device_name(0))
print('opencv:', cv2.__version__)
print('trimesh:', trimesh.__version__)
print('ALL OK')
"

4 Launch JupyterLab

# Inside aiml-dev container
jupyter lab --ip=0.0.0.0 --port=8888 --no-browser --allow-root

# On your laptop browser
# http://dgx-spark.local:8888

5 Re-enter stopped container

# Check if stopped
docker ps -a | grep dev

# Restart and re-enter
docker start aiml-dev
docker exec -it aiml-dev bash

# Or one line
docker start -ai aiml-dev

6 Save work after install session

exit
docker commit aiml-dev vertexnova/aiml-spark:v2
docker images | grep vertexnova

Container registry summary

vertexnova/gsplat-spark v1 built

3D Gaussian Splatting workspace.

gsplat 1.5.3 PyTorch 2.7 CUDA 13 ARM64 gdown

21.9 GB

vertexnova/aiml-spark v1 built

General AI/ML workspace.

PyTorch 2.7 OpenCV 4.10 trimesh plyfile JupyterLab pandas scikit-learn seaborn

22.0 GB

vertexnova/nerfstudio-spark planned

Full nerfstudio with COLMAP for training 3DGS from custom scenes.

nerfstudio COLMAP gsplat Open3D

vertexnova/vertexnova-spark planned

C++ rendering engine development — Vulkan, CMake, SPIRV-Cross.

CMake Vulkan SDK SPIRV-Cross clang

Known ARM64 + CUDA 13 limitations

open3d no wheel No ARM64 binary. Must build from source (~60 min) or use trimesh instead.

tensorflow-cpu no wheel No Python 3.12 ARM64 wheel. Use dedicated TF container with Python 3.10.

vllm fragile No stable cu130 aarch64 release. Pin to specific nightly wheel.

flash-attn fragile aarch64 builds exist but need TORCH_CUDA_ARCH_LIST=12.0 at build time.

CUDA 12.x packages incompatible DGX Spark only has libcudart.so.13. Any package built against CUDA 12 will fail with ImportError.