Logs explain individual incidents; metrics show whether throughput, latency, capacity, and
ticket flow are trending the right way. You want both.
1. Enable metrics
go-livepeer exposes a Prometheus endpoint when you start it with -monitor:
http://localhost:7935/metrics — the same port as the go-livepeer CLI;
-monitor just activates the /metrics path on it.
| Flag | What it does |
|---|---|
-monitor | Enables /metrics. Required for any Prometheus scraping. |
-metricsPerStream | Breaks performance metrics out per stream — useful for diagnosing a single session. |
-metricsClientIP | Adds the client IP to metric labels, so you can see which gateway is routing to you. |
In a split orchestrator/transcoder setup, pass
-monitor on both processes — each exposes its
own /metrics on its own CLI port.2. Stand up a dashboard
- Fastest: Docker stack
- Existing Prometheus
Livepeer maintains an image bundling Prometheus, Grafana, and starter dashboards:Then open Grafana at
http://localhost:3000 (default login admin / admin). For multiple
nodes, pass a comma-separated list: LP_NODES=node1:7935,node2:7935. LP_MODE also supports
docker-compose and kubernetes. Source and dashboards:
livepeer/livepeer-monitoring.3. Watch the signals that matter
Monitor across five layers — the last two are where money is won or lost:| Layer | What to watch | How |
|---|---|---|
| Hardware | GPU utilization, VRAM, temperature | nvidia-smi, DCGM exporter |
| Application | Segment / job success rate, session capacity | /metrics, dashboard |
| Network | Latency, packet loss | host monitoring |
| On-chain | Active-set status, bonded stake, reward calls | Explorer |
| Economics | ETH fees, LPT rewards | Explorer, /metrics |
| Metric | Signal |
|---|---|
livepeer_current_sessions_total vs livepeer_max_sessions | How close to capacity you are (idle → lower price; maxed → add GPU) |
livepeer_segment_processed_total / livepeer_segment_errors_total | Core transcoding health. Rising errors → gateways deprioritize you |
livepeer_transcode_latency_seconds | GPU saturation or a slow pipeline — both hurt gateway scoring |
| Winning tickets received vs redeemed | A growing gap means an ETH-balance or redemption problem |
| Round number & reward-call status | Whether you’re claiming inflation every round |
4. Monitor the AI runner (if you serve AI)
AI inference runs in a separateai-runner container, so watch it independently for a faster signal:
| Message | Meaning |
|---|---|
Loaded model <id> | Model is warm in VRAM, ready to process |
Error loading model | Bad model ID or not enough VRAM |
CUDA out of memory | VRAM exhausted — lower capacity in aiModels.json or the model count |
Container health check failed | Alive but not responding |
5. Capture logs
By default logs go to stdout only. For a production node, persist them and know the two debug levers:Also on the Explorer
Even without a metrics stack, explorer.livepeer.org shows your active-set status, bonded and delegated LPT, reward-call history, fee earnings, and historical performance — the fastest external gut-check that your node is healthy and earning.Related
Configure your orchestrator
The flags these metrics reflect — sessions, GPUs, pricing.
Activate on Arbitrum
Reward calling and the ETH balance your alerts depend on.
Not receiving jobs?
When the dashboard says idle, work through the four causes.
Hardware & GPU
VRAM and session-limit context for capacity metrics.