Commands Reference

This page tracks the public GPU CLI surface that ships today. Use gpu --help for a quick overview, gpu --help-all for advanced and hidden commands, and LLM Inference for the routed LLM workflow.

gpu run

Run any command on a remote GPU pod.

gpu run <COMMAND>

Common examples

gpu run python train.py
gpu run -d python long_job.py
gpu run -p 8080:8080 python server.py
gpu run --gpu-type "RTX 4090" python train.py
gpu run --attach job_abc123

Key flags

Flag	Description
`--attach, -a <JOB_ID>`	Reattach to an existing job
`--detach, -d`	Submit the job and return immediately
`--status, -s`	Show current pod state and recent jobs
`--cancel <JOB_ID>`	Cancel a running job
`--tail, -n <N>`	Show last N lines when attaching to a completed job
`--interactive, -i`	Allocate a PTY and keep stdin open
`--gpu-type <TYPE>`	Override GPU type for this run
`--gpu-count <N>`	Request multiple GPUs (1-8)
`--min-vram <GB>`	Set fallback VRAM floor
`--rebuild`	Force pod recreation when the Dockerfile has changed
`--output, -o <PATH>`	Override synced output paths
`--no-output`	Disable output syncing
`--sync`	Wait for output sync before exiting
`--outputs`	Show live daemon-managed output sync events
`--output-summary`	Show a brief sync summary on completion
`--no-summary`	Suppress the end-of-run summary banner
`--show-sync`	Show detailed sync progress
`--force-sync`	Ignore change detection and do a full sync
`--remote-path <PATH>`	Override the remote workspace path
`--publish, -p <[LOCAL:]REMOTE>`	Publish remote ports to localhost
`--no-port-forward`	Disable automatic port detection
`--no-persistent-proxy`	Stop the local proxy when the pod stops
`--env, -e <KEY=VALUE>`	Set environment variables
`--json`	JSON output after submit; only works with `--detach`

gpu serve

Serve ML models on cloud GPUs with automatic configuration. Resolves the right GPU, engine, and quantization for your model.

gpu serve meta-llama/Llama-3.3-70B-Instruct --dry-run
gpu serve llama-70b --quantize int4 --budget 2.00 --dry-run
gpu serve deepseek-r1:70b --users 50 -y
gpu serve llama-8b whisper-large-v3 --dry-run

Key flags

Flag	Description
`--dry-run`	Show GPU recommendation without launching
`-y, --yes`	Skip confirmation and launch immediately
`--quantize <FORMAT>`	Quantization: `fp16`, `int8`, `int4`, `gguf-q4`, etc.
`--gpu <TYPE>`	Override GPU type (e.g., `h100`, `a100`)
`--gpu-count <N>`	Override GPU count (1-8)
`--engine <ENGINE>`	Override engine: `vllm`, `ollama`, `llamacpp`
`--context <TOKENS>`	Override context window size
`--users <N>`	Target concurrent users (affects GPU sizing)
`--budget <DOLLARS>`	Maximum hourly cost; shows comparison table
`--optimize <TARGET>`	Optimization: `cost`, `latency`, `balanced`

Supports LLMs, diffusion models (FLUX, SDXL), STT (Whisper), and TTS (Kokoro). See the model catalog for supported models and aliases.

gpu attach

Reattach to a job by ID. This is a focused alias for gpu run --attach.

gpu attach job_abc123
gpu attach job_abc123 --tail 50

gpu cp

Copy files to or from the remote workspace.

gpu cp model.pt gpu:/workspace/
gpu cp gpu:/workspace/results/ ./results
gpu cp -r ./data gpu:/workspace/data/

Use the gpu: prefix, or the shorthand : prefix, for the remote side.

Authenticate with GPU CLI in your browser.

gpu login

Flag	Description
`--timeout <SECONDS>`	Browser auth timeout (default: 300)

gpu logout

Remove the stored browser session.

gpu logout

Flag	Description
`--yes, -y, --force`	Skip confirmation

gpu auth

Manage provider and model-hub credentials.

Authenticate with your cloud provider profile.

gpu auth login
gpu auth login --profile dev
gpu auth login --generate-ssh-keys

auth logout

Remove provider credentials.

gpu auth logout

auth status

Show current auth state.

gpu auth status

auth add

Add HuggingFace or Civitai credentials for model downloads.

gpu auth add hf
gpu auth add civitai --token <VALUE>

auth remove

Remove model-hub credentials.

gpu auth remove hf

auth hubs

List configured model-hub credentials.

gpu auth hubs

gpu init

Create a gpu.jsonc file for the current project.

gpu init

Flag	Description
`--gpu-type <TYPE>`	Default GPU type
`--max-price <PRICE>`	Maximum hourly price
`--profile <NAME>`	Profile to use
`--encryption` / `--no-encryption`	Toggle volume encryption
`--force, -f`	Reinitialize an existing config

gpu inventory

Compare GPU pricing and availability across every cloud you're authenticated with.

gpu inventory                              # Cross-provider matrix (default)
gpu inventory --provider runpod            # Detailed view for one provider
gpu inventory --gpu-type H100 --min-vram 80
gpu inventory --available                  # Only in-stock GPUs
gpu inventory --show-datacenters           # Show DC under each cell
gpu inventory --show-unmapped              # Include "Unknown GPU" listings
gpu inventory --json                       # Structured output for scripts

Cross-provider matrix (default)

Without --provider, gpu inventory queries every authenticated cloud (RunPod, Vast.ai, Thunder Compute, io.net) in parallel and renders a comparison matrix. Rows are canonical GPU types; columns are providers; cells show price + stock indicator.

GPU                    VRAM    RunPod          Vast.ai         io.net
─────────────────────────────────────────────────────────────────────
NVIDIA H100 80GB HBM3  80GB    $ 2.99 ███      $ 1.87 ██░      $ 1.85 ██░ *
NVIDIA H100 PCIe       80GB    $ 2.39 █░░      ── ─            ── ─
NVIDIA RTX 4090        24GB    $ 0.69 ███      $ 0.45 ███ ◆    $ 0.60 ██░

  * = best price in row     ◆ = best price + best stock     ── = not offered

A footer summary highlights the winner across all rows: most availability, lowest average $/hr, and the intersection (or split when no single provider wins both).

Single-provider mode

Pass --provider <name> (runpod, vastai, thunder, ionet) for the original detailed table, with every offer, datacenter, and stock indicator for just that cloud. The flat JSON shape is preserved here for backward compatibility with existing scripts.

Filters

Flag	Effect
`--available` / `-a`	Only show GPUs with stock
`--min-vram <GB>`	Minimum VRAM
`--max-price <USD>`	Maximum hourly price
`--region <id>`	Filter by datacenter region
`--gpu-type <name>`	Fuzzy match on GPU name
`--cloud-type <type>`	RunPod-specific: `secure`, `community`, `all`
`--show-datacenters`	Append the winning datacenter under each cell
`--show-unmapped`	Include rows where the provider couldn't identify the GPU model (mainly Vast.ai)

Authentication

The matrix only includes providers you've authenticated with. Run gpu auth status to see which are configured, and gpu auth <provider> to add one. Providers that error during the fetch (rate limit, bad key, network) render as an ERR column with the message in the footer. Other providers still surface.

JSON output

--json emits a structured object designed for scripts:

{
  "providers": ["runpod", "vastai", "ionet"],
  "rows": [
    {
      "canonical_type": "h100_sxm",
      "display_name": "NVIDIA H100 80GB HBM3",
      "vram_gb": 80,
      "tier": "Flagship",
      "cells": {
        "runpod": { "price_per_hour": 2.99, "stock_level": "high",   "datacenter_id": "us-ca",   ... },
        "vastai": { "price_per_hour": 1.87, "stock_level": "medium", "datacenter_id": "us-west", ... }
      },
      "best_price_provider": "vastai",
      "best_stock_provider": "runpod"
    }
  ],
  "summary": {
    "most_availability": "runpod",
    "lowest_average_price": "vastai",
    "best_overall": null
  },
  "errors": {},
  "filters": { "min_vram": null, ... }
}

When --provider <name> is set, --json returns the legacy flat array of GPU choices instead; the matrix shape is opt-out.

gpu enroll

gpu fleet pair-node --name @james-5090
# then run the printed command on the GPU host:
gpu enroll --auth-key=tskey-... --name @james-5090

Local Fleet is a beta feature. Keep the trusted controller and every GPU host on the current GPU CLI release while this flow stabilizes.

gpu enroll is intentionally target-side: run it inside the Linux/WSL environment that owns the GPU. The portal never SSHes into your machine. Use the full command printed by gpu fleet pair-node; it includes pairing metadata that lets the node pin your fleet root locally. After enrollment, the node uses its own credentials, portal JWT rotation, relay control, and WebRTC data channels for local fleet jobs.

Flag	Description
`--auth-key <KEY>`	One-time enrollment key printed by `gpu fleet pair-node`
`--portal-url <URL>`	Portal URL override for development or staging
`--relay-url <URL>`	Relay WebSocket control URL override
`--name <NAME>`	Fleet handle for this node, with or without leading `@`; alias for `--node-name`

If the portal shows a node as needing pairing, it registered without a controller-signed trust binding. Run gpu fleet pair-node --name @<name> from a trusted controller, then run the printed gpu enroll command on the GPU host.

gpu node

Manage beta nodes registered to your local fleet.

gpu node retire @james-5090
gpu node retire node-abc123def456 --no-wait

retire revokes the portal-side node record. The agent treats a 401 heartbeat as a revocation signal and exits on its next heartbeat cycle.

Subcommand	Description
`retire <TARGET>`	Revoke a node by `@name` or literal `node-...` id

gpu fleet

Manage local fleet beta trust and controller pairing.

gpu fleet init
gpu fleet request-access --device-name mbp
gpu fleet approve-device ctrl_abc123
gpu fleet pair-node --name @james-5090

Browser sessions can store pending requests, but approval is signed by a trusted local controller key. The portal cannot mint a controller grant on its own.

Subcommand	Description
`init`	Create the fleet root and approve this machine as a controller
`request-access`	Request controller access from another machine
`approve-device <CODE>`	Approve a pending controller request locally
`pair-node --name <NAME>`	Create a pairing session and enrollment token for `@<name>`

gpu placement

Explain how the cross-provider placement decider would rank candidates for your project, without creating a pod.

gpu placement explain                                  # Human table, current project config
gpu placement explain --json                           # Machine-readable output
gpu placement explain --providers runpod,vastai        # Limit to an allow-list
gpu placement explain --exclude-provider ionet         # Repeatable exclusion
gpu placement explain --min-vram 40 --max-price 2.50   # Extra hard filters
gpu placement explain --gpu-type H100 --gpu-type A100  # GPU-type filter (repeatable)

Read-only diagnostic for provider = "auto" / providers = [...] configs. The decider runs against live inventory plus the project's deployment footprint (volume-affinity, cache-affinity, per-provider session count) and surfaces both the ranked candidate list and any hard-filter rejections.

Human output

A ranked table with one row per (provider, gpu, datacenter) candidate:

Rank  Provider  GPU                         DC        $/hr   Stock  Score   Reasons
──────────────────────────────────────────────────────────────────────────────────────
 1    vastai    NVIDIA H100 80GB HBM3       us-west   1.87   ██░    0.812   price-win, stock-boost
 2    runpod    NVIDIA H100 80GB HBM3       us-ca     2.99   ███    0.741   stock-boost
 3    ionet     NVIDIA H100 PCIe            sfo-1     1.85   ██░    0.698   price-win

Rejections (hard filter):
  - runpod / NVIDIA A100: volume_locked (volume datacenter us-ca, candidate in us-east)
  - ionet  / NVIDIA L40S: price_cap (2.89 > 2.50 max_price)

Flags

Flag	Effect
`--providers <csv>`	Ordered allow-list; empty = all authenticated
`--exclude-provider <name>` / `-x`	Exclude a provider (repeatable)
`--gpu-type <name>` / `-g`	Filter to these GPU types (repeatable)
`--min-vram <GB>`	Hard VRAM floor
`--max-price <USD>`	Hard price ceiling per hour
`--json`	Machine output: `{ candidates: [...], rejections: [...], inventory_fetched_at_unix_ms: number }`

When the decider runs for real

gpu placement explain is purely diagnostic. The same decider dispatches automatically during gpu start / gpu run whenever you haven't pinned a specific provider, which is the default. Pin a provider (e.g. provider = "runpod") to bypass the decider. See crates/gpu/docs/providers/placement.md for the full scoring model and how to tune PlacementWeights.

gpu status

Show project status, active pods, recent jobs, and cost information.

gpu status
gpu status --json

gpu dashboard

Open the TUI dashboard for pods and jobs.

gpu dashboard

Keybindings: j/k to move, Tab to switch panels, Enter to expand, a to attach, c to cancel, s to stop, e for events, ? for help, q to quit.

gpu logs

Stream or inspect unified logs for jobs, sync events, hooks, and agent output.

gpu logs
gpu logs --job job_abc123 --tail 50
gpu logs --follow --type lifecycle
gpu logs --json

gpu last

Render the most recent run from the run-history index (mesh plane). With --failed, shows the most recent run that failed, failure-first (kind, cause, attempts, last node, traceback ref).

gpu last             # newest run summary
gpu last --failed    # newest failed run, failure detail

Flag	Description
`--failed`	Show the most recent failed run with failure detail
`--json`	Output JSON

gpu rent

Rent GPU capacity through the mesh rentals resource — a thin client over the daemon's /v1/rentals surface: list offers, create a rental, inspect status, release. A freshly-ready rental enrolls as a node and starts matching queued constraints.

gpu rent offers --constraint 'vram>=48'
gpu rent create of-123 --keep-alive 30 --max-spend 12.50
gpu rent list
gpu rent status r-01
gpu rent release r-01

Subcommand	Description
`offers`	List available rental offers (filter with `--constraint`)
`create <OFFER_ID>`	Create a rental (`--keep-alive`, `--max-spend`)
`list`	List active rentals
`status <RENTAL_ID>`	Inspect a rental's status
`release <RENTAL_ID>`	Release a rental

gpu stop

Stop the active pod immediately.

gpu stop
gpu stop pod_abc123 --no-sync

Flag	Description
`[POD_ID]`	Optional pod ID
`--yes, -y, --force`	Skip confirmation
`--no-sync`	Skip final output sync
`--json`	Output JSON

gpu use

Run a template directly or resume a template session.

gpu use ollama
gpu use vllm
gpu use owner/repo@v1.0.0
gpu use

Flags

Flag	Description
`--name <NAME>`	Override the session or project name
`--yes`	Skip interactive prompts
`--dry-run`	Show what would be created
`--input <KEY=VALUE>`	Provide template input values

See LLM Inference for the built-in Ollama, vLLM, and llama.cpp flows.

gpu llm

Launch the routed LLM workflow for Ollama, vLLM, or llama.cpp, with either local proxy registration or hosted publish.

gpu llm run
gpu llm run --ollama --model deepseek-r1:8b -y
gpu llm run --vllm --url meta-llama/Llama-3.1-8B-Instruct -y
gpu llm run --vllm --model Qwen/Qwen2.5-0.5B-Instruct --publish hosted --name staging-qwen -y
gpu llm info deepseek-r1:70b
gpu llm info --url meta-llama/Llama-3.1-8B-Instruct --json

Use LLM Inference for setup, port layout, and wake-on-request routing behavior.

gpu comfyui

Run ComfyUI workflows from the curated workflow catalog.

gpu comfyui list
gpu comfyui info flux_schnell
gpu comfyui validate flux_schnell --gpu-type "RTX 4090"
gpu comfyui run flux_schnell --volume-id vol_abc123
gpu comfyui generate "a cat astronaut on mars"
gpu comfyui stop --all

gpu notebook

Run Marimo notebooks on GPU pods.

gpu notebook
gpu notebook train.py
gpu notebook train.py --run
gpu notebook --new analysis

Common flags

Flag	Description
`--run, -r`	Serve in read-only app mode
`--new <NAME>`	Create a new notebook
`--gpu-type <TYPE>`	Override GPU selection
`--volume-id <ID>`	Use a specific network volume
`--no-volume`	Use ephemeral storage
`--yes, -y`	Skip confirmation

gpu volume

Manage network volumes.

volume list

gpu volume list
gpu volume list --detailed
gpu volume list --json

volume create

gpu volume create --name my-models --size 200
gpu volume create --datacenter US-TX-3 --set-global

volume delete

gpu volume delete my-models --force

volume extend

gpu volume extend my-models --size 300

volume set-global

gpu volume set-global my-models

volume status

gpu volume status
gpu volume status --volume my-models --json

volume migrate

gpu volume migrate my-models --to US-OR-1

volume sync

gpu volume sync source-volume dest-volume
gpu volume sync source-volume dest-volume --method rsync

volume cancel-migration

gpu volume cancel-migration transfer_abc123

gpu vault

Manage encrypted vault storage for sensitive outputs.

vault list

gpu vault list
gpu vault list --project my-project --json

vault export

gpu vault export checkpoints/model.pt ./model.pt
gpu vault export generated/image.png ~/Desktop/ --force

vault stats

gpu vault stats
gpu vault stats --project my-project --json

gpu proxy

Manage the local proxy router for OpenAI-compatible model access.

proxy start

gpu proxy start

Starts the local proxy listeners. By default the LLM proxy listens on 127.0.0.1:4000 and the diffusion proxy listens on 127.0.0.1:4001.

proxy stop

gpu proxy stop

Stops all proxy listeners and health checks.

proxy status

gpu proxy status
gpu proxy status --json

Shows whether the proxy is running, the configured listen addresses, and registered model counts.

proxy models

gpu proxy models
gpu proxy models --json

Lists registered models, backend counts, service types, and healthy backend counts.

proxy register

gpu proxy register --pod-id llm-ollama-qwen3-5-9b --model qwen3.5:9b --port 11434 --type ollama
gpu proxy register --pod-id abc123 --model meta-llama/Llama-3-8B --port 8000 --type vllm

Manually registers a backend when you need to attach an existing service to the local proxy router.

proxy deregister

gpu proxy deregister --pod-id llm-ollama-qwen3-5-9b --model qwen3.5:9b
gpu proxy deregister --pod-id abc123 --model meta-llama/Llama-3-8B

Removes a backend from the proxy router.

gpu proxy hosted

Manage the org-scoped hosted proxy for the active organization.

proxy create

gpu proxy create
gpu proxy create --backend runpod
gpu proxy create --backend vastai

Creates the org-scoped hosted proxy deployment. Use --backend to pin a provider-hosted backend when you do not want the portal-managed Railway path.

proxy hosted status

gpu proxy hosted status
gpu proxy hosted status --json

Shows hosted deployment status, the public connection URL, and version drift details.

proxy hosted connect-info

gpu proxy hosted connect-info
gpu proxy hosted connect-info --json

Prints the connection details for the active org's hosted proxy.

proxy hosted upgrade

gpu proxy hosted upgrade --yes

Requests a hosted-proxy upgrade when drift is detected. Railway remains portal-managed end to end, while provider-hosted backends reconcile the requested gpu-cli version through the running hosted daemon after initial provisioning.

proxy hosted destroy

gpu proxy hosted destroy

Destroys the org-scoped hosted proxy deployment.

proxy keys

gpu proxy keys create --name "alice"
gpu proxy keys list
gpu proxy keys revoke --name "alice"

Manage org-scoped hosted-proxy API keys for the active organization.

gpu config

Inspect or validate configuration.

gpu config show
gpu config validate
gpu config schema
gpu config get default_profile
gpu config set updates.auto_update false

gpu template

Browse official templates or clear the local template cache.

gpu template list
gpu template list --json
gpu template clear-cache -y

gpu org

Manage organizations, membership, sub-accounts, and service accounts.

gpu org list
gpu org create "My Team"
gpu org delete my-team
gpu org switch my-team
gpu org invite alice@example.com --role admin
gpu org service-account create --name ci
gpu org service-account revoke sa_abc123

See Organizations & Service Accounts for the current model, including headless automation.

gpu serverless

Deploy and manage RunPod Serverless endpoints.

gpu serverless deploy
gpu serverless list --json
gpu serverless status <ENDPOINT_ID>
gpu serverless delete
gpu serverless warm <ENDPOINT_ID> --gpu
gpu serverless logs <ENDPOINT_ID>
gpu serverless template delete

serverless status

gpu serverless status <ENDPOINT_ID>
gpu serverless status <ENDPOINT_ID> --json

serverless logs

gpu serverless logs <ENDPOINT_ID>
gpu serverless logs <ENDPOINT_ID> --tail 100
gpu serverless logs <ENDPOINT_ID> --status error

gpu serverless logs currently points you to the RunPod dashboard rather than streaming or filtering logs inside the CLI, but these are still the public CLI flags.

serverless warm

gpu serverless warm <ENDPOINT_ID> --gpu
gpu serverless warm <ENDPOINT_ID> --cpu
gpu serverless warm <ENDPOINT_ID> --timeout 900

Current behavior

gpu serverless status and gpu serverless warm should be treated as endpoint-ID-driven commands.
gpu serverless delete supports name lookup and interactive selection when you omit the endpoint.
gpu serverless warm --gpu is the supported warmup path today.
gpu serverless warm --cpu, deploy-time --warm, and deploy-time --write-ids are not wired through the current runtime.
gpu serverless logs currently points you to the RunPod dashboard rather than streaming filtered logs in the CLI.

Use Serverless Endpoints for templates, config shape, and request examples.

gpu start

Start a new GPU pod explicitly instead of letting gpu run manage pod lifecycle for you.

gpu start
gpu start --gpu-type "NVIDIA RTX 6000 Ada"
gpu start --min-vram 24 --max-price 1.50

Key flags

Flag	Description
`--gpu-type <TYPE>`	Pin a specific GPU type
`--gpu-count <N>`	Request multiple GPUs
`--max-price <PRICE>`	Limit hourly spend during selection
`--min-vram <GB>`	Set a VRAM floor
`--cloud-type <TYPE>`	Choose `secure`, `community`, or `all`
`--region <REGION>`	Restrict to a region or datacenter
`--docker-image <IMAGE>`	Override the image
`--force`	Skip reuse checks and create a new pod

gpu instances

Inspect or terminate provider-side instances directly.

gpu instances list
gpu instances list --project my-project
gpu instances show pod_abc123
gpu instances terminate pod_abc123 -y

gpu rename

Rename an instance's memorable name. Works for both rentals and owned-fleet nodes; the leading @ is optional, and running jobs are unaffected. The pod (or node) id is stable, only the human-readable name changes.

gpu rename bold-falcon h100-bench    # rename by bare names
gpu rename @bold-falcon @h100-bench  # leading @ is optional

gpu desktop

Manage the desktop app.

gpu desktop install
gpu desktop install --channel beta
gpu desktop uninstall -y

gpu update

Update GPU CLI to the latest version or inspect update availability.

gpu update
gpu update --check
gpu update --target-version 1.2.3
gpu update --dismiss

gpu upgrade

Open the pricing flow to upgrade your subscription.

gpu upgrade

gpu changelog

View release notes by version or version range.

gpu changelog
gpu changelog 0.8.0
gpu changelog --from 0.7.0 --to 0.8.0

gpu daemon

Manage the background daemon.

gpu daemon status
gpu daemon start
gpu daemon restart
gpu daemon logs --follow

gpu doctor

Run a setup diagnostic.

gpu doctor
gpu doctor --json

gpu issue

Submit an issue report with optional daemon logs.

gpu issue
gpu issue "sync not working"
gpu issue --no-logs -y "bug report"

gpu agent-docs

Print the agent-focused CLI reference to stdout.

gpu agent-docs
gpu agent-docs | head -50

gpu support

Open the GPU CLI community support link in your browser.

gpu support

Global flags

These flags work with all commands:

Flag	Description
`-v, --verbose`	Increase logging verbosity (`-v`, `-vv`, `-vvv`)
`-q, --quiet`	Minimal output
`--progress-style <STYLE>`	Progress display style: `panel`, `pipeline`, `minimal`, `verbose`
`--help-all`	Show all commands including hidden ones
`--no-auto-update`	Disable the update check for this invocation

Commands Reference

On this page