live-video-to-video pipeline (also called
Cascade) that powers live style transfer and generative video apps like Daydream. Unlike
batch AI, your node doesn’t answer discrete requests: it
continuously transforms a live WebRTC stream, frame by frame, with a ~33 ms budget per frame at
30 fps.
This guide assumes a working orchestrator (see the tutorial)
and familiarity with batch AI setup — the flags are the same;
the runner, models, and hardware bar are not.
How it differs from batch AI
| Batch AI | Real-time AI (Cascade) | |
|---|---|---|
| Input / output | Discrete request → result | Continuous WebRTC stream in → transformed stream out |
| Latency target | Seconds per request | < 100 ms per frame |
| Runner image | livepeer/ai-runner | livepeer/ai-runner:live-base |
| Models | Standard diffusion, Whisper, BLIP | StreamDiffusion, ComfyUI DAGs, ControlNet |
| Min VRAM | 4–24 GB by pipeline | 24 GB recommended |
The moving parts
Your orchestrator receives a WebRTC stream from a gateway, hands it to the AI runner, and streams processed frames back:Set it up
Verify GPU access
Pull the live-base runner
livepeer/ai-runner — it bundles ComfyStream and its
dependencies.Configure aiModels.json
Add a For live pipelines,
live-video-to-video entry:model_id names the ComfyUI workflow/pipeline, not a Hugging Face model —
the models load inside the ComfyStream container. Your price_per_unit must be at or below the
gateway’s -maxPricePerCapability for this pipeline — a hard filter, regardless of your hardware.Download model weights
Weights must exist before the container starts:Download into the directory your node exposes via
-aiModelsDir.Start with the AI worker flags
The same three flags as batch AI —
-aiWorker, -aiModels, -aiModelsDir — with your live
entry in aiModels.json. Run with -v 6 initially to watch frame-loop activity in the logs.Verify registration
Your node should appear under live-video-to-video with Warm status at
tools.livepeer.cloud/ai/network-capabilities.
Test end-to-end with the utilities in the ComfyStream repo.
Tuning for the frame budget
Everything about Cascade operation is a fight for milliseconds:- Keep models warm. A cold model load mid-stream is a dropped stream.
"warm": trueis not optional here. - Trade steps for latency. StreamDiffusion workflows run at inference steps as low as
2; quality vs latency is tunable in the workflow definition. - Watch VRAM headroom. Frame buffers ride on top of model weights — a node that fits batch models exactly will OOM on live streams.
- Dedicate the GPU. Sharing a Cascade GPU with transcoding or batch AI undermines the latency that gets you selected.
Going further
Operators who want custom live pipelines (beyond ComfyUI workflows) can implement the AI runner’s PythonPipeline interface and package it as an image extending ai-runner:live-base — see the
ai-runner docs and the
Pipeline interface.
Note that custom pipelines currently require small upstream additions (the pipeline registry is a
static mapping); a dynamic plugin architecture is in progress.
Related
Add AI inference (batch)
The batch pipelines — start here if you haven’t run AI yet.
Hardware & GPU support
VRAM requirements across workloads.
Monitor your orchestrator
Watch the AI runner container and frame latency.
Set pricing
Price per capability, and gateway price caps.