Add AI inference - Livepeer Docs

This guide adds AI inference to an orchestrator that is already running and activated for transcoding. By the end, your node accepts AI jobs alongside video work.

Setting up from scratch? Do Run your first orchestrator first, then come back here. AI pipelines require Linux.

1. Check your available VRAM

AI inference runs in a separate Docker container. If it shares a GPU with transcoding, VRAM is split between them. Check what’s free:

nvidia-smi --query-gpu=index,name,memory.total,memory.free --format=csv

Pick a pipeline you can actually fit:

Pipeline	Min VRAM
`image-to-text`	4 GB
`segment-anything-2`	6 GB
`llm` (quantized 7–8B)	8 GB
`audio-to-text` (Whisper)	12 GB
`image-to-video`	16 GB+
`image-to-image`	20 GB
`text-to-image` (SD/SDXL)	24 GB

If the GPU lacks free VRAM for both transcoding and your chosen pipeline, AI runner containers fail to start. Pick a lower-VRAM pipeline, dedicate a second GPU to AI, or stop transcoding on that GPU.

See the hardware reference for the full VRAM-by-workload table.

2. Pull the AI runner image

docker pull livepeer/ai-runner:latest
# Some pipelines need a dedicated image, e.g.:
docker pull livepeer/ai-runner:segment-anything-2

3. Configure `aiModels.json`

This file tells your node which pipelines and models to serve, what to charge, and what to keep warm in VRAM. Create ~/.lpData/aiModels.json with at least one entry:

[
  {
    "pipeline": "text-to-image",
    "model_id": "ByteDance/SDXL-Lightning",
    "price_per_unit": 4768371,
    "warm": true
  }
]

Field	Required	Description
`pipeline`	Yes	Pipeline name (e.g. `text-to-image`, `audio-to-text`, `llm`)
`model_id`	Yes	Hugging Face model ID (must be on the Livepeer-verified list)
`price_per_unit`	Yes	Price in wei per unit
`warm`	No	If `true`, preload into VRAM on startup
`capacity`	No	Max concurrent requests (default `1`)
`optimization_flags`	No	`SFAST` (~~+25% speed) and/or `DEEPCACHE` (~~+50% speed)

Don’t use DEEPCACHE with Lightning/Turbo models — they’re already optimized and quality drops. SFAST and DEEPCACHE can’t be combined. Changes to aiModels.json are not hot-reloaded — restart the node after editing.

4. Enable the AI worker

Add three flags to your startup command:

livepeer \
  ...your existing transcoding flags... \
  -aiWorker \
  -aiModels ~/.lpData/aiModels.json \
  -aiModelsDir ~/.lpData/models

Flag	What it does
`-aiWorker`	Enables the AI worker; without it, all AI config is ignored
`-aiModels`	Path to `aiModels.json`
`-aiModelsDir`	Host directory holding cached model weights

Running in Docker? Mount the Docker socket so the node can spawn AI runner containers, and use port 8936 to avoid clashing with the transcoding orchestrator on 8935:

docker run --name livepeer-ai-orchestrator \
  -v ~/.lpData/:/root/.lpData/ \
  -v /var/run/docker.sock:/var/run/docker.sock \
  --network host --gpus all \
  livepeer/go-livepeer:master \
  -orchestrator -serviceAddr 0.0.0.0:8936 -nvidia 0 \
  -aiWorker \
  -aiModels /root/.lpData/aiModels.json \
  -aiModelsDir ~/.lpData/models

With Docker-out-of-Docker, -aiModelsDir must be a path on the host machine, not inside the container — the node passes it directly to the runner containers it spawns.

5. Verify AI is active

Within seconds of startup you should see a managed-container log line for each warm model:

INFO Starting managed container gpu=0 name=text-to-image_ByteDance_SDXL-Lightning ...

Then send a test request to the runner:

curl -X POST "http://localhost:8000/text-to-image" \
  -H "Content-Type: application/json" \
  -d '{"model_id":"ByteDance/SDXL-Lightning","prompt":"A cool cat on the beach","width":512,"height":512}'

A successful response contains an images array. Finally, confirm your pipelines appear externally at tools.livepeer.cloud/ai/network-capabilities (search your orchestrator address; allow 2–5 minutes). If jobs still don’t arrive, check aiModels.json is valid, the model_id matches a verified model, and the runner is reachable — see the AI troubleshooting entries in the FAQ.

Set pricing

Price each pipeline and model competitively.

Hardware reference

VRAM planning and GPU selection for AI.

​1. Check your available VRAM

​2. Pull the AI runner image

​3. Configure aiModels.json

​4. Enable the AI worker

​5. Verify AI is active

​Next

Set pricing

Hardware reference

1. Check your available VRAM

2. Pull the AI runner image

3. Configure `aiModels.json`

4. Enable the AI worker

5. Verify AI is active

Next