Commands Reference
Current reference for public GPU CLI commands
Commands Reference
This page tracks the public GPU CLI surface that ships today. Use gpu --help for a quick overview, gpu --help-all for advanced and hidden commands, and LLM Inference for the routed LLM workflow.
gpu run
Run any command on a remote GPU pod.
gpu run <COMMAND>Common examples
gpu run python train.py
gpu run -d python long_job.py
gpu run -p 8080:8080 python server.py
gpu run --gpu-type "RTX 4090" python train.py
gpu run --attach job_abc123Key flags
| Flag | Description |
|---|---|
--attach, -a <JOB_ID> | Reattach to an existing job |
--detach, -d | Submit the job and return immediately |
--status, -s | Show current pod state and recent jobs |
--cancel <JOB_ID> | Cancel a running job |
--tail, -n <N> | Show last N lines when attaching to a completed job |
--interactive, -i | Allocate a PTY and keep stdin open |
--gpu-type <TYPE> | Override GPU type for this run |
--gpu-count <N> | Request multiple GPUs (1-8) |
--min-vram <GB> | Set fallback VRAM floor |
--rebuild | Force pod recreation when the Dockerfile has changed |
--output, -o <PATH> | Override synced output paths |
--no-output | Disable output syncing |
--sync | Wait for output sync before exiting |
--outputs | Show live daemon-managed output sync events |
--output-summary | Show a brief sync summary on completion |
--no-summary | Suppress the end-of-run summary banner |
--show-sync | Show detailed sync progress |
--force-sync | Ignore change detection and do a full sync |
--remote-path <PATH> | Override the remote workspace path |
--publish, -p <[LOCAL:]REMOTE> | Publish remote ports to localhost |
--no-port-forward | Disable automatic port detection |
--no-persistent-proxy | Stop the local proxy when the pod stops |
--env, -e <KEY=VALUE> | Set environment variables |
--json | JSON output after submit; only works with --detach |
gpu serve
Serve ML models on cloud GPUs with automatic configuration. Resolves the right GPU, engine, and quantization for your model.
gpu serve meta-llama/Llama-3.3-70B-Instruct --dry-run
gpu serve llama-70b --quantize int4 --budget 2.00 --dry-run
gpu serve deepseek-r1:70b --users 50 -y
gpu serve llama-8b whisper-large-v3 --dry-runKey flags
| Flag | Description |
|---|---|
--dry-run | Show GPU recommendation without launching |
-y, --yes | Skip confirmation and launch immediately |
--quantize <FORMAT> | Quantization: fp16, int8, int4, gguf-q4, etc. |
--gpu <TYPE> | Override GPU type (e.g., h100, a100) |
--gpu-count <N> | Override GPU count (1–8) |
--engine <ENGINE> | Override engine: vllm, ollama, llamacpp |
--context <TOKENS> | Override context window size |
--users <N> | Target concurrent users (affects GPU sizing) |
--budget <DOLLARS> | Maximum hourly cost; shows comparison table |
--optimize <TARGET> | Optimization: cost, latency, balanced |
Supports LLMs, diffusion models (FLUX, SDXL), STT (Whisper), and TTS (Kokoro). See the model catalog for supported models and aliases.
gpu attach
Reattach to a job by ID. This is a focused alias for gpu run --attach.
gpu attach job_abc123
gpu attach job_abc123 --tail 50gpu cp
Copy files to or from the remote workspace.
gpu cp model.pt gpu:/workspace/
gpu cp gpu:/workspace/results/ ./results
gpu cp -r ./data gpu:/workspace/data/Use the gpu: prefix, or the shorthand : prefix, for the remote side.
gpu login
Authenticate with GPU CLI in your browser.
gpu login| Flag | Description |
|---|---|
--timeout <SECONDS> | Browser auth timeout (default: 300) |
gpu logout
Remove the stored browser session.
gpu logout| Flag | Description |
|---|---|
--yes, -y, --force | Skip confirmation |
gpu auth
Manage provider and model-hub credentials.
auth login
Authenticate with your cloud provider profile.
gpu auth login
gpu auth login --profile dev
gpu auth login --generate-ssh-keysauth logout
Remove provider credentials.
gpu auth logoutauth status
Show current auth state.
gpu auth statusauth add
Add HuggingFace or Civitai credentials for model downloads.
gpu auth add hf
gpu auth add civitai --token <VALUE>auth remove
Remove model-hub credentials.
gpu auth remove hfauth hubs
List configured model-hub credentials.
gpu auth hubsgpu init
Create a gpu.jsonc file for the current project.
gpu init| Flag | Description |
|---|---|
--gpu-type <TYPE> | Default GPU type |
--max-price <PRICE> | Maximum hourly price |
--profile <NAME> | Profile to use |
--encryption / --no-encryption | Toggle volume encryption |
--force, -f | Reinitialize an existing config |
gpu inventory
Compare GPU pricing and availability across every cloud you're authenticated with.
gpu inventory # Cross-provider matrix (default)
gpu inventory --provider runpod # Detailed view for one provider
gpu inventory --gpu-type H100 --min-vram 80
gpu inventory --available # Only in-stock GPUs
gpu inventory --show-datacenters # Show DC under each cell
gpu inventory --show-unmapped # Include "Unknown GPU" listings
gpu inventory --json # Structured output for scriptsCross-provider matrix (default)
Without --provider, gpu inventory queries every authenticated cloud
(RunPod, Vast.ai, Thunder Compute, io.net) in parallel and renders a
comparison matrix. Rows are canonical GPU types; columns are providers;
cells show price + stock indicator.
GPU VRAM RunPod Vast.ai io.net
─────────────────────────────────────────────────────────────────────
NVIDIA H100 80GB HBM3 80GB $ 2.99 ███ $ 1.87 ██░ $ 1.85 ██░ *
NVIDIA H100 PCIe 80GB $ 2.39 █░░ ── ─ ── ─
NVIDIA RTX 4090 24GB $ 0.69 ███ $ 0.45 ███ ◆ $ 0.60 ██░
* = best price in row ◆ = best price + best stock ── = not offeredA footer summary highlights the winner across all rows: most availability,
lowest average $/hr, and the intersection (or split when no single
provider wins both).
Single-provider mode
Pass --provider <name> (runpod, vastai, thunder, ionet) for the
original detailed table — every offer, datacenter, and stock indicator for
just that cloud. The flat JSON shape is preserved here for backward
compatibility with existing scripts.
Filters
| Flag | Effect |
|---|---|
--available / -a | Only show GPUs with stock |
--min-vram <GB> | Minimum VRAM |
--max-price <USD> | Maximum hourly price |
--region <id> | Filter by datacenter region |
--gpu-type <name> | Fuzzy match on GPU name |
--cloud-type <type> | RunPod-specific: secure, community, all |
--show-datacenters | Append the winning datacenter under each cell |
--show-unmapped | Include rows where the provider couldn't identify the GPU model (mainly Vast.ai) |
Authentication
The matrix only includes providers you've authenticated with. Run
gpu auth status to see which are configured, and gpu auth <provider>
to add one. Providers that error during the fetch (rate limit, bad key,
network) render as an ERR column with the message in the footer — other
providers still surface.
JSON output
--json emits a structured object designed for scripts:
{
"providers": ["runpod", "vastai", "ionet"],
"rows": [
{
"canonical_type": "h100_sxm",
"display_name": "NVIDIA H100 80GB HBM3",
"vram_gb": 80,
"tier": "Flagship",
"cells": {
"runpod": { "price_per_hour": 2.99, "stock_level": "high", "datacenter_id": "us-ca", ... },
"vastai": { "price_per_hour": 1.87, "stock_level": "medium", "datacenter_id": "us-west", ... }
},
"best_price_provider": "vastai",
"best_stock_provider": "runpod"
}
],
"summary": {
"most_availability": "runpod",
"lowest_average_price": "vastai",
"best_overall": null
},
"errors": {},
"filters": { "min_vram": null, ... }
}When --provider <name> is set, --json returns the legacy flat array
of GPU choices instead — the matrix shape is opt-out.
gpu placement
Explain how the cross-provider placement decider would rank candidates for your project — without creating a pod.
gpu placement explain # Human table, current project config
gpu placement explain --json # Machine-readable output
gpu placement explain --providers runpod,vastai # Limit to an allow-list
gpu placement explain --exclude-provider ionet # Repeatable exclusion
gpu placement explain --min-vram 40 --max-price 2.50 # Extra hard filters
gpu placement explain --gpu-type H100 --gpu-type A100 # GPU-type filter (repeatable)Read-only diagnostic for provider = "auto" / providers = [...] configs.
The decider runs against live inventory plus the project's deployment
footprint (volume-affinity, cache-affinity, per-provider session count)
and surfaces both the ranked candidate list and any hard-filter
rejections.
Human output
A ranked table with one row per (provider, gpu, datacenter) candidate:
Rank Provider GPU DC $/hr Stock Score Reasons
──────────────────────────────────────────────────────────────────────────────────────
1 vastai NVIDIA H100 80GB HBM3 us-west 1.87 ██░ 0.812 price-win, stock-boost
2 runpod NVIDIA H100 80GB HBM3 us-ca 2.99 ███ 0.741 stock-boost
3 ionet NVIDIA H100 PCIe sfo-1 1.85 ██░ 0.698 price-win
Rejections (hard filter):
- runpod / NVIDIA A100: volume_locked (volume datacenter us-ca, candidate in us-east)
- ionet / NVIDIA L40S: price_cap (2.89 > 2.50 max_price)Flags
| Flag | Effect |
|---|---|
--providers <csv> | Ordered allow-list; empty = all authenticated |
--exclude-provider <name> / -x | Exclude a provider (repeatable) |
--gpu-type <name> / -g | Filter to these GPU types (repeatable) |
--min-vram <GB> | Hard VRAM floor |
--max-price <USD> | Hard price ceiling per hour |
--json | Machine output: { candidates: [...], rejections: [...], inventory_fetched_at_unix_ms: number } |
When the decider runs for real
gpu placement explain is purely diagnostic. The same decider dispatches
automatically during gpu start / gpu run whenever you haven't pinned a
specific provider — which is the default. Pin a provider (e.g. provider = "runpod") to bypass the decider. See
crates/gpu/docs/providers/placement.md
for the full scoring model and how to tune PlacementWeights.
gpu status
Show project status, active pods, recent jobs, and cost information.
gpu status
gpu status --jsongpu dashboard
Open the TUI dashboard for pods and jobs.
gpu dashboardKeybindings: j/k to move, Tab to switch panels, Enter to expand, a to attach, c to cancel, s to stop, e for events, ? for help, q to quit.
gpu logs
Stream or inspect unified logs for jobs, sync events, hooks, and agent output.
gpu logs
gpu logs --job job_abc123 --tail 50
gpu logs --follow --type lifecycle
gpu logs --jsongpu stop
Stop the active pod immediately.
gpu stop
gpu stop pod_abc123 --no-sync| Flag | Description |
|---|---|
[POD_ID] | Optional pod ID |
--yes, -y, --force | Skip confirmation |
--no-sync | Skip final output sync |
--json | Output JSON |
gpu use
Run a template directly or resume a template session.
gpu use ollama
gpu use vllm
gpu use owner/repo@v1.0.0
gpu useFlags
| Flag | Description |
|---|---|
--name <NAME> | Override the session or project name |
--yes | Skip interactive prompts |
--dry-run | Show what would be created |
--input <KEY=VALUE> | Provide template input values |
See LLM Inference for the built-in Ollama, vLLM, and llama.cpp flows.
gpu llm
Launch the routed LLM workflow for Ollama, vLLM, or llama.cpp, with either local proxy registration or hosted publish.
gpu llm run
gpu llm run --ollama --model deepseek-r1:8b -y
gpu llm run --vllm --url meta-llama/Llama-3.1-8B-Instruct -y
gpu llm run --vllm --model Qwen/Qwen2.5-0.5B-Instruct --publish hosted --name staging-qwen -y
gpu llm info deepseek-r1:70b
gpu llm info --url meta-llama/Llama-3.1-8B-Instruct --jsonUse LLM Inference for setup, port layout, and wake-on-request routing behavior.
gpu comfyui
Run ComfyUI workflows from the curated workflow catalog.
gpu comfyui list
gpu comfyui info flux_schnell
gpu comfyui validate flux_schnell --gpu-type "RTX 4090"
gpu comfyui run flux_schnell --volume-id vol_abc123
gpu comfyui generate "a cat astronaut on mars"
gpu comfyui stop --allgpu notebook
Run Marimo notebooks on GPU pods.
gpu notebook
gpu notebook train.py
gpu notebook train.py --run
gpu notebook --new analysisCommon flags
| Flag | Description |
|---|---|
--run, -r | Serve in read-only app mode |
--new <NAME> | Create a new notebook |
--gpu-type <TYPE> | Override GPU selection |
--volume-id <ID> | Use a specific network volume |
--no-volume | Use ephemeral storage |
--yes, -y | Skip confirmation |
gpu volume
Manage network volumes.
volume list
gpu volume list
gpu volume list --detailed
gpu volume list --jsonvolume create
gpu volume create --name my-models --size 200
gpu volume create --datacenter US-TX-3 --set-globalvolume delete
gpu volume delete my-models --forcevolume extend
gpu volume extend my-models --size 300volume set-global
gpu volume set-global my-modelsvolume status
gpu volume status
gpu volume status --volume my-models --jsonvolume migrate
gpu volume migrate my-models --to US-OR-1volume sync
gpu volume sync source-volume dest-volume
gpu volume sync source-volume dest-volume --method rsyncvolume cancel-migration
gpu volume cancel-migration transfer_abc123gpu vault
Manage encrypted vault storage for sensitive outputs.
vault list
gpu vault list
gpu vault list --project my-project --jsonvault export
gpu vault export checkpoints/model.pt ./model.pt
gpu vault export generated/image.png ~/Desktop/ --forcevault stats
gpu vault stats
gpu vault stats --project my-project --jsongpu proxy
Manage the local proxy router for OpenAI-compatible model access.
proxy start
gpu proxy startStarts the local proxy listeners. By default the LLM proxy listens on 127.0.0.1:4000 and the diffusion proxy listens on 127.0.0.1:4001.
proxy stop
gpu proxy stopStops all proxy listeners and health checks.
proxy status
gpu proxy status
gpu proxy status --jsonShows whether the proxy is running, the configured listen addresses, and registered model counts.
proxy models
gpu proxy models
gpu proxy models --jsonLists registered models, backend counts, service types, and healthy backend counts.
proxy register
gpu proxy register --pod-id llm-ollama-qwen3-5-9b --model qwen3.5:9b --port 11434 --type ollama
gpu proxy register --pod-id abc123 --model meta-llama/Llama-3-8B --port 8000 --type vllmManually registers a backend when you need to attach an existing service to the local proxy router.
proxy deregister
gpu proxy deregister --pod-id llm-ollama-qwen3-5-9b --model qwen3.5:9b
gpu proxy deregister --pod-id abc123 --model meta-llama/Llama-3-8BRemoves a backend from the proxy router.
gpu proxy hosted
Manage the org-scoped hosted proxy for the active organization.
proxy create
gpu proxy create
gpu proxy create --backend runpod
gpu proxy create --backend vastaiCreates the org-scoped hosted proxy deployment. Use --backend to pin a provider-hosted backend when you do not want the portal-managed Railway path.
proxy hosted status
gpu proxy hosted status
gpu proxy hosted status --jsonShows hosted deployment status, the public connection URL, and version drift details.
proxy hosted connect-info
gpu proxy hosted connect-info
gpu proxy hosted connect-info --jsonPrints the connection details for the active org's hosted proxy.
proxy hosted upgrade
gpu proxy hosted upgrade --yesRequests a hosted-proxy upgrade when drift is detected. Railway remains portal-managed end to end, while provider-hosted backends reconcile the requested gpu-cli version through the running hosted daemon after initial provisioning.
proxy hosted destroy
gpu proxy hosted destroyDestroys the org-scoped hosted proxy deployment.
proxy keys
gpu proxy keys create --name "alice"
gpu proxy keys list
gpu proxy keys revoke --name "alice"Manage org-scoped hosted-proxy API keys for the active organization.
gpu config
Inspect or validate configuration.
gpu config show
gpu config validate
gpu config schema
gpu config get default_profile
gpu config set updates.auto_update falsegpu template
Browse official templates or clear the local template cache.
gpu template list
gpu template list --json
gpu template clear-cache -ygpu org
Manage organizations, membership, sub-accounts, and service accounts.
gpu org list
gpu org create "My Team"
gpu org delete my-team
gpu org switch my-team
gpu org invite alice@example.com --role admin
gpu org service-account create --name ci
gpu org service-account revoke sa_abc123See Organizations & Service Accounts for the current model, including headless automation.
gpu serverless
Deploy and manage RunPod Serverless endpoints.
gpu serverless deploy
gpu serverless list --json
gpu serverless status <ENDPOINT_ID>
gpu serverless delete
gpu serverless warm <ENDPOINT_ID> --gpu
gpu serverless logs <ENDPOINT_ID>
gpu serverless template deleteserverless status
gpu serverless status <ENDPOINT_ID>
gpu serverless status <ENDPOINT_ID> --jsonserverless logs
gpu serverless logs <ENDPOINT_ID>
gpu serverless logs <ENDPOINT_ID> --tail 100
gpu serverless logs <ENDPOINT_ID> --status errorgpu serverless logs currently points you to the RunPod dashboard rather than streaming or filtering logs inside the CLI, but these are still the public CLI flags.
serverless warm
gpu serverless warm <ENDPOINT_ID> --gpu
gpu serverless warm <ENDPOINT_ID> --cpu
gpu serverless warm <ENDPOINT_ID> --timeout 900Current behavior
gpu serverless statusandgpu serverless warmshould be treated as endpoint-ID-driven commands.gpu serverless deletesupports name lookup and interactive selection when you omit the endpoint.gpu serverless warm --gpuis the supported warmup path today.gpu serverless warm --cpu, deploy-time--warm, and deploy-time--write-idsare not wired through the current runtime.gpu serverless logscurrently points you to the RunPod dashboard rather than streaming filtered logs in the CLI.
Use Serverless Endpoints for templates, config shape, and request examples.
gpu start
Start a new GPU pod explicitly instead of letting gpu run manage pod lifecycle for you.
gpu start
gpu start --gpu-type "NVIDIA RTX 6000 Ada"
gpu start --min-vram 24 --max-price 1.50Key flags
| Flag | Description |
|---|---|
--gpu-type <TYPE> | Pin a specific GPU type |
--gpu-count <N> | Request multiple GPUs |
--max-price <PRICE> | Limit hourly spend during selection |
--min-vram <GB> | Set a VRAM floor |
--cloud-type <TYPE> | Choose secure, community, or all |
--region <REGION> | Restrict to a region or datacenter |
--docker-image <IMAGE> | Override the image |
--force | Skip reuse checks and create a new pod |
gpu instances
Inspect or terminate provider-side instances directly.
gpu instances list
gpu instances list --project my-project
gpu instances show pod_abc123
gpu instances terminate pod_abc123 -ygpu desktop
Manage the desktop app.
gpu desktop install
gpu desktop install --channel beta
gpu desktop uninstall -ygpu update
Update GPU CLI to the latest version or inspect update availability.
gpu update
gpu update --check
gpu update --target-version 1.2.3
gpu update --dismissgpu upgrade
Open the pricing flow to upgrade your subscription.
gpu upgradegpu changelog
View release notes by version or version range.
gpu changelog
gpu changelog 0.8.0
gpu changelog --from 0.7.0 --to 0.8.0gpu daemon
Manage the background daemon.
gpu daemon status
gpu daemon start
gpu daemon restart
gpu daemon logs --followgpu doctor
Run a setup diagnostic.
gpu doctor
gpu doctor --jsongpu issue
Submit an issue report with optional daemon logs.
gpu issue
gpu issue "sync not working"
gpu issue --no-logs -y "bug report"gpu agent-docs
Print the agent-focused CLI reference to stdout.
gpu agent-docs
gpu agent-docs | head -50gpu support
Open the GPU CLI community support link in your browser.
gpu supportGlobal flags
These flags work with all commands:
| Flag | Description |
|---|---|
-v, --verbose | Increase logging verbosity (-v, -vv, -vvv) |
-q, --quiet | Minimal output |
--progress-style <STYLE> | Progress display style: panel, pipeline, minimal, verbose |
--help-all | Show all commands including hidden ones |
--no-auto-update | Disable the update check for this invocation |