Docker on Spark
Commands and patterns · DGX Spark
Use this page next to DGX Containers when you need exact flags, inspect commands, or cleanup order. Same tab pattern as the container guide: pick a section, copy blocks with the button in the corner.
What is Docker
Docker packages software and all its dependencies into a container — a lightweight isolated environment that runs the same way on any machine.
On the DGX Spark this matters especially because the machine runs ARM64 + CUDA 13. Most ML packages don't have pre-built wheels for this combo. NVIDIA ships optimised containers that work out of the box.
Core terms
Read-only template: filesystem layers and metadata. Created by pull or build; you do not edit it in place.
Runnable instance of an image: writable layer on top. Your shell and installs live here until you commit or discard.
Where images are stored. Docker Hub, NVIDIA NGC (nvcr.io), GitHub Container Registry (ghcr.io).
-v ~/workspace:/workspace maps a host folder into the container so files survive after exit.
Container lifecycle
Core concepts
Images vs containers
# An image is a snapshot — read only
docker images # list images
docker pull nvcr.io/nvidia/pytorch:25.03-py3 # download image
# A container is a running (or stopped) instance
docker ps # running containers
docker ps -a # all incl. stopped
# Many containers can run from the same image
docker run -it IMAGE bash # new container each time Volumes — how files persist
# Without a mount — files are LOST on exit
docker run -it IMAGE bash
echo "hello" > /tmp/test.txt
exit # test.txt is gone
# With a mount — files on host persist
docker run -it -v ~/workspace:/workspace IMAGE bash
echo "hello" > /workspace/test.txt
exit # still at ~/workspace/test.txt on host ~/workspace into /workspace on every long-lived dev container so repos and artifacts survive container removal.Ports — accessing services from host
# -p HOST_PORT:CONTAINER_PORT
docker run -p 7007:7007 IMAGE bash
# nerfstudio viewer at http://localhost:7007
docker run -p 8888:8888 IMAGE jupyter lab
# JupyterLab at http://localhost:8888 Inspect commands
What is running
docker psRunning containers — ID, image, name, uptime, portsdocker ps -aAll containers including stoppeddocker ps --format "table {{.Names}}\t{{.Image}}\t{{.Status}}"Clean readable tableResource usage
docker statsLive CPU + memory for all running containersdocker stats --no-streamSingle snapshot, not livedocker top CONTAINERProcesses running inside containerInspect details
docker inspect CONTAINERFull JSON — mounts, env vars, network, everythingdocker inspect CONTAINER | grep -A2 MountsShow mounted volumesdocker port CONTAINERShow port mappingsLogs
docker logs CONTAINERAll stdout/stderr outputdocker logs -f CONTAINERFollow logs live (like tail -f)docker logs --tail 50 CONTAINERLast 50 lines onlyRun commands
Standard DGX Spark run
docker run --gpus all -it \
--name gsplat-dev \ # give it a name — always
--ipc=host \ # shared memory for PyTorch
--ulimit memlock=-1 \ # no memory lock limit
--ulimit stack=67108864 \ # 64MB stack
-v ~/workspace:/workspace \ # mount workspace
-p 7007:7007 \ # nerfstudio viewer port
vertexnova/gsplat-spark:v1 bash Common run flags
--gpus allExpose all GPUs. Required for CUDA.-itInteractive + TTY. Gives you a live shell.--name NAMEMemorable name. Always set this.--rmAuto-delete on exit. Use for throwaway runs.-dDetached — run in background.-e VAR=valueSet environment variable inside container.-v HOST:CONTAINERMount a folder.-p HOST:CONTAINERExpose a port from container to host.Re-enter a stopped container
# Container exited — restart and get shell back
docker start gsplat-dev
docker exec -it gsplat-dev bash
# Or in one line
docker start -ai gsplat-dev Modify & save containers
The workflow: run → install things → commit to new image → use that image next session.
The modify → save workflow
# 1. Start your container
docker run --gpus all -it --name gsplat-dev \
--ipc=host --ulimit memlock=-1 --ulimit stack=67108864 \
-v ~/workspace:/workspace \
vertexnova/gsplat-spark:v1 bash
# 2. Inside container — install something new
pip install open3d trimesh plyfile
# 3. Exit
exit
# 4. Commit — saves changes as new image
docker commit gsplat-dev vertexnova/gsplat-spark:v2
# 5. Next session — start from v2
docker run --gpus all -it --name gsplat-dev \
vertexnova/gsplat-spark:v2 bash Exec into a running container
docker exec -it CONTAINER bashOpen shell in already-running containerdocker exec CONTAINER pip install PKGInstall without entering shelldocker exec CONTAINER nvidia-smiRun one command and exitCopy files in/out
docker cp CONTAINER:/workspace/file.py ~/Copy file from container to hostdocker cp ~/script.py CONTAINER:/workspace/Copy file from host into containerdocker rename OLD NEWRename a container without recreating/workspace) persist without a commit. Everything else in the container layer is ephemeral.Image management
List & inspect
docker imagesAll local images — name, tag, size, agedocker images | grep vertexnovaYour saved images onlydocker history IMAGEShow layers that make up an imagedocker image inspect IMAGEFull metadata JSONTag & version
# Convention: namespace/PURPOSE-spark:vN
docker tag vertexnova/gsplat-spark:v1 vertexnova/gsplat-spark:stable
# Example local images
vertexnova/gsplat-spark:v1 # 3DGS workspace
vertexnova/aiml-spark:v1 # general ML workspace Remove images
docker rmi IMAGEDelete an image (no containers using it)docker image pruneRemove dangling untagged images only — safedocker rmi on tagged images you still need; prefer docker image prune for dangling layers only.Cleanup & disk management
The NVIDIA PyTorch image is 21.8GB. Your gsplat-spark image is 21.9GB. Regular cleanup matters.
Check disk first
docker system df
docker images --format "table {{.Repository}}:{{.Tag}}\t{{.Size}}" Safe cleanup
docker container pruneRemove all stopped containers. Safe.docker image pruneRemove dangling untagged images only. Safe.docker system pruneContainers + dangling images + networks. Safe.Safe cleanup sequence
# Step 1: see what's there
docker system df
# Step 2: remove stopped containers
docker container prune
# Step 3: remove dangling images only
docker image prune
# Step 4: verify your saved images are intact
docker images | grep vertexnova docker system prune -a removes every unused image, including tagged ones. Only use it after docker images review.docker system df → docker container prune → docker image prune → re-list images.On this Spark
Image names and run lines aligned with the DGX Containers guide. Adjust hostnames and ports to your network.
Saved images
vertexnova/gsplat-spark:v1 # gsplat 1.5.3 · PyTorch 2.7 · CUDA 13 · ARM64
vertexnova/aiml-spark:v1 # CV + notebooks + 3D helpers
vertexnova/vertexnova-spark:v1 # planned — engine toolchain Launch gsplat container
docker run --gpus all -it \
--name gsplat-dev \
--ipc=host \
--ulimit memlock=-1 \
--ulimit stack=67108864 \
-v ~/workspace:/workspace \
-p 7007:7007 \
vertexnova/gsplat-spark:v1 bash Re-enter if already created
docker ps -a | grep gsplat-dev
docker start gsplat-dev
docker exec -it gsplat-dev bash Verify GPU inside container
nvidia-smi
python3 -c "import torch; print(torch.cuda.get_device_name(0))"
python3 -c "import gsplat; print('gsplat', gsplat.__version__)" Health checks after reboot
# 1. GPU driver OK
nvidia-smi
# 2. Docker GPU access OK
docker run --rm --gpus all \
nvcr.io/nvidia/cuda:13.0.0-base-ubuntu24.04 nvidia-smi
# 3. Your images intact
docker images | grep vertexnova
# 4. Optional — services you expect to stay up
docker ps | grep YOUR_SERVICE