Setup · Reference

Docker on Spark

Commands and patterns · DGX Spark

Use this page next to DGX Containers when you need exact flags, inspect commands, or cleanup order. Same tab pattern as the container guide: pick a section, copy blocks with the button in the corner.

What is Docker

Docker packages software and all its dependencies into a container — a lightweight isolated environment that runs the same way on any machine.

On the DGX Spark this matters especially because the machine runs ARM64 + CUDA 13. Most ML packages don't have pre-built wheels for this combo. NVIDIA ships optimised containers that work out of the box.

Host rule: do not install ML stacks on the DGX OS. Use containers so drivers and Docker stay the only moving parts on the host.

Core terms

Image

Read-only template: filesystem layers and metadata. Created by pull or build; you do not edit it in place.

Container

Runnable instance of an image: writable layer on top. Your shell and installs live here until you commit or discard.

Registry = library

Where images are stored. Docker Hub, NVIDIA NGC (nvcr.io), GitHub Container Registry (ghcr.io).

Volume = shared folder

-v ~/workspace:/workspace maps a host folder into the container so files survive after exit.

Container lifecycle

Imageon registry
Runningdocker run
Stoppedexit
Committeddocker commit
New imagesaved

Core concepts

Images vs containers

# An image is a snapshot — read only
docker images                                  # list images
docker pull nvcr.io/nvidia/pytorch:25.03-py3   # download image

# A container is a running (or stopped) instance
docker ps                                      # running containers
docker ps -a                                   # all incl. stopped

# Many containers can run from the same image
docker run -it IMAGE bash                      # new container each time

Volumes — how files persist

# Without a mount — files are LOST on exit
docker run -it IMAGE bash
echo "hello" > /tmp/test.txt
exit   # test.txt is gone

# With a mount — files on host persist
docker run -it -v ~/workspace:/workspace IMAGE bash
echo "hello" > /workspace/test.txt
exit   # still at ~/workspace/test.txt on host
Mount ~/workspace into /workspace on every long-lived dev container so repos and artifacts survive container removal.

Ports — accessing services from host

# -p HOST_PORT:CONTAINER_PORT
docker run -p 7007:7007 IMAGE bash
# nerfstudio viewer at http://localhost:7007

docker run -p 8888:8888 IMAGE jupyter lab
# JupyterLab at http://localhost:8888

Inspect commands

What is running

docker psRunning containers — ID, image, name, uptime, ports
docker ps -aAll containers including stopped
docker ps --format "table {{.Names}}\t{{.Image}}\t{{.Status}}"Clean readable table

Resource usage

docker statsLive CPU + memory for all running containers
docker stats --no-streamSingle snapshot, not live
docker top CONTAINERProcesses running inside container

Inspect details

docker inspect CONTAINERFull JSON — mounts, env vars, network, everything
docker inspect CONTAINER | grep -A2 MountsShow mounted volumes
docker port CONTAINERShow port mappings

Logs

docker logs CONTAINERAll stdout/stderr output
docker logs -f CONTAINERFollow logs live (like tail -f)
docker logs --tail 50 CONTAINERLast 50 lines only

Run commands

Standard DGX Spark run

docker run --gpus all -it \
  --name gsplat-dev \          # give it a name — always
  --ipc=host \                 # shared memory for PyTorch
  --ulimit memlock=-1 \        # no memory lock limit
  --ulimit stack=67108864 \    # 64MB stack
  -v ~/workspace:/workspace \  # mount workspace
  -p 7007:7007 \               # nerfstudio viewer port
  vertexnova/gsplat-spark:v1 bash

Common run flags

--gpus allExpose all GPUs. Required for CUDA.
-itInteractive + TTY. Gives you a live shell.
--name NAMEMemorable name. Always set this.
--rmAuto-delete on exit. Use for throwaway runs.
-dDetached — run in background.
-e VAR=valueSet environment variable inside container.
-v HOST:CONTAINERMount a folder.
-p HOST:CONTAINERExpose a port from container to host.

Re-enter a stopped container

# Container exited — restart and get shell back
docker start gsplat-dev
docker exec -it gsplat-dev bash

# Or in one line
docker start -ai gsplat-dev

Modify & save containers

The workflow: run → install things → commit to new image → use that image next session.

The modify → save workflow

# 1. Start your container
docker run --gpus all -it --name gsplat-dev \
  --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 \
  -v ~/workspace:/workspace \
  vertexnova/gsplat-spark:v1 bash

# 2. Inside container — install something new
pip install open3d trimesh plyfile

# 3. Exit
exit

# 4. Commit — saves changes as new image
docker commit gsplat-dev vertexnova/gsplat-spark:v2

# 5. Next session — start from v2
docker run --gpus all -it --name gsplat-dev \
  vertexnova/gsplat-spark:v2 bash

Exec into a running container

docker exec -it CONTAINER bashOpen shell in already-running container
docker exec CONTAINER pip install PKGInstall without entering shell
docker exec CONTAINER nvidia-smiRun one command and exit

Copy files in/out

docker cp CONTAINER:/workspace/file.py ~/Copy file from container to host
docker cp ~/script.py CONTAINER:/workspace/Copy file from host into container
docker rename OLD NEWRename a container without recreating
Only bind-mounted paths (for example /workspace) persist without a commit. Everything else in the container layer is ephemeral.

Image management

List & inspect

docker imagesAll local images — name, tag, size, age
docker images | grep vertexnovaYour saved images only
docker history IMAGEShow layers that make up an image
docker image inspect IMAGEFull metadata JSON

Tag & version

# Convention: namespace/PURPOSE-spark:vN
docker tag vertexnova/gsplat-spark:v1 vertexnova/gsplat-spark:stable

# Example local images
vertexnova/gsplat-spark:v1   # 3DGS workspace
vertexnova/aiml-spark:v1     # general ML workspace

Remove images

docker rmi IMAGEDelete an image (no containers using it)
docker image pruneRemove dangling untagged images only — safe
Avoid docker rmi on tagged images you still need; prefer docker image prune for dangling layers only.

Cleanup & disk management

The NVIDIA PyTorch image is 21.8GB. Your gsplat-spark image is 21.9GB. Regular cleanup matters.

Check disk first

docker system df
docker images --format "table {{.Repository}}:{{.Tag}}\t{{.Size}}"

Safe cleanup

docker container pruneRemove all stopped containers. Safe.
docker image pruneRemove dangling untagged images only. Safe.
docker system pruneContainers + dangling images + networks. Safe.

Safe cleanup sequence

# Step 1: see what's there
docker system df

# Step 2: remove stopped containers
docker container prune

# Step 3: remove dangling images only
docker image prune

# Step 4: verify your saved images are intact
docker images | grep vertexnova
docker system prune -a removes every unused image, including tagged ones. Only use it after docker images review.
Safe sequence: docker system dfdocker container prunedocker image prune → re-list images.

On this Spark

Image names and run lines aligned with the DGX Containers guide. Adjust hostnames and ports to your network.

Saved images

vertexnova/gsplat-spark:v1      # gsplat 1.5.3 · PyTorch 2.7 · CUDA 13 · ARM64
vertexnova/aiml-spark:v1        # CV + notebooks + 3D helpers
vertexnova/vertexnova-spark:v1  # planned — engine toolchain

Launch gsplat container

docker run --gpus all -it \
  --name gsplat-dev \
  --ipc=host \
  --ulimit memlock=-1 \
  --ulimit stack=67108864 \
  -v ~/workspace:/workspace \
  -p 7007:7007 \
  vertexnova/gsplat-spark:v1 bash

Re-enter if already created

docker ps -a | grep gsplat-dev
docker start gsplat-dev
docker exec -it gsplat-dev bash

Verify GPU inside container

nvidia-smi
python3 -c "import torch; print(torch.cuda.get_device_name(0))"
python3 -c "import gsplat; print('gsplat', gsplat.__version__)"

Health checks after reboot

# 1. GPU driver OK
nvidia-smi

# 2. Docker GPU access OK
docker run --rm --gpus all \
  nvcr.io/nvidia/cuda:13.0.0-base-ubuntu24.04 nvidia-smi

# 3. Your images intact
docker images | grep vertexnova

# 4. Optional — services you expect to stay up
docker ps | grep YOUR_SERVICE
After reboot: driver, GPU in Docker, your images listed, then any always-on services you rely on.