GPU Rental Costs Vast.ai
Live on-demand GPU pricing for renting your own image/video inference box. Use the estimator to predict a monthly bill before you spin anything up.
How Vast.ai billing works
- Time-based, per-second. You pay a fixed $/hr for the rented GPU the whole time it is running. The number of generations doesn't change cost — only wall-clock runtime does.
- Stopped instances still cost a small disk fee (
$/GB-month) until you destroy them. Bandwidth is metered but minor. - Destroy to stop all charges. Stopping pauses compute billing; destroying releases the disk too.
- Contrast: hosted APIs (Higgsfield / fal) bill per generated asset; Vast bills per minute of GPU time regardless of output count.
Account balance
One-click backend
Primary generation backend. Add FAL_KEY (id:secret) from fal.ai to enable real generation.
GPU backend control
No Vast instances yet. Click Launch L40S backend to rent one and auto-provision it.
Manual provision command
# On the Vast L40S instance (Docker-enabled template): git clone <your-repo> ai-generator && cd ai-generator export HF_TOKEN=hf_xxx # gated weights bash scripts/vast-provision.sh --with-sdxl # → paste the printed env block into the app's .env.local, redeploy. # Idle cost control (from the app host), cron every 5 min: VAST_DRY_RUN=false pnpm vast:reaper
Live pricing
Loading live offers from Vast.ai…
Session cost estimator
Select a GPU to estimate costs.
Estimates only — actual cost depends on the specific host you pick, runtime, storage, and bandwidth. Remember to destroy the instance when done; idle running time is still billed.
How to rent & wire it up
1. On console.vast.ai, rent an on-demand instance with a PyTorch or ComfyUI template and enough VRAM for your model.
2. Expose the service port, then point this app's provider URLs at the instance IP — e.g. set COMFYUI_URL, LTX_URL, or A1111_URL in your environment.
3. When you're finished, destroy the instance in the Vast console to stop both compute and storage charges.