BeastMode banner artwork for the Chewbacuh and LiL-Beastly capacity dispatch

Mission Control Dispatch | Published May 7, 2026

BeastMode inference lanes: Chewbacuh vs LiL-Beastly

We moved the two BeastMode ESXi inference guests onto static 192.168.12.x service addresses, published them to the public Yeti Claw surface, and then ran the first controlled capacity benchmark through the live BeastMode route to see where each lane stays interactive and where it simply starts queueing.

BeastMode Chewbacuh LiL-Beastly Qwen3 ESXi Mission Control

Download The PDF Download Artifact Bundle Back To Mission Control

Chewbacuh baseline 10.37s

Average public-path latency at concurrency 1 on qwen3:8b.

LiL-Beastly baseline 21.20s

Average public-path latency at concurrency 1 on qwen3:14b.

Highest tested band 4

Both lanes completed the 4-way step with zero request failures.

Peak CPU busy 100%

Both VMs saturated CPU on every benchmark step.

Executive read

What we learned

Chewbacuh is the fast lane. It averaged 10.37 seconds at one live conversation and held throughput around 0.098 requests per second across the entire tested band.
LiL-Beastly is the heavier lane. It averaged 21.20 seconds at one live conversation and held throughput near 0.05 requests per second across the tested band.
Neither lane failed under load. Both completed every request through concurrency 4 with zero HTTP or inference failures.
Both lanes became queue-bound quickly. Additional concurrency mostly increased wait time instead of increasing useful throughput.

Systems under test

Benchmark envelope and operator guardrails

Lane	Guest profile	Model	Static service IP	Guardrail
Chewbacuh	8 vCPU, 48 GiB RAM, ESXi guest	qwen3:8b	192.168.12.173	Controlled cap at concurrency 4 to preserve interactive access
LiL-Beastly	12 vCPU, 96 GiB RAM, ESXi guest	qwen3:14b	192.168.12.174	Controlled cap at concurrency 4 because the 14B lane is explicitly queue-heavy

Curve read

Throughput, latency, and guest pressure

Both BeastMode lanes saturate CPU early. The important difference is starting latency and model weight: Chewbacuh stays noticeably faster, while LiL-Beastly trades speed for the larger 14B model. Mission Control captured guest CPU, load average, and memory use. Physical temperature telemetry is not exposed reliably inside these VMs, so it is intentionally excluded from the committee findings.

Throughput curve comparing Chewbacuh and LiL-Beastly — Throughput stays almost flat, which is the signature of queueing rather than parallel speed-up.

Latency curve comparing Chewbacuh and LiL-Beastly — Latency rises sharply as sessions stack up on CPU-saturated lanes.

Guest CPU and memory curves for Chewbacuh and LiL-Beastly — Both guests hit 100% CPU busy; LiL-Beastly also carries the larger memory footprint.

Comparative table

Headline operating bands

Lane	Tested range	Hard failures	Recommended operating band	Peak environment
Chewbacuh	1-4 concurrent conversations	0	1 premium / 2 acceptable / 4 queued	100% peak CPU busy, 6.80 GiB peak guest memory used
LiL-Beastly	1-4 concurrent conversations	0	1 premium / 2 acceptable / 4 queued-heavy	100% peak CPU busy, 11.30 GiB peak guest memory used

Chewbacuh step table

Validated qwen3:8b results

Concurrency	Success	Throughput rps	Avg latency s	P95 s	Peak CPU %	Peak load1	Peak mem MB
1	6/6	0.096	10.37	10.71	100.0	6.01	6749.2
2	6/6	0.099	18.56	20.41	100.0	7.42	6751.3
4	12/12	0.098	35.88	41.24	100.0	8.09	6802.1

LiL-Beastly step table

Validated qwen3:14b results

Concurrency	Success	Throughput rps	Avg latency s	P95 s	Peak CPU %	Peak load1	Peak mem MB
1	6/6	0.047	21.20	24.20	100.0	11.28	11264.2
2	6/6	0.050	36.48	40.06	100.0	12.16	11245.5
4	12/12	0.051	68.47	78.62	100.0	12.28	11299.0

Operator opinion

How to use the lanes

Route speed-sensitive public chat to Chewbacuh first.
Reserve LiL-Beastly for prompts where the larger model is worth the higher wait time.
Keep clear working indicators in the UI because both lanes become queue-bound before they become unstable.
Do not market concurrency 4 as “real-time” on either lane. It is stable, but it is not snappy.

Downloads

Dispatch desk

Pull the committee PDF, the raw JSON payload, CSV summaries, benchmark runner, and chart assets from the BeastMode capacity pack.

Download PDF Grab ZIP Raw JSON

Key findings

Committee highlights

Chewbacuh is the faster lane

Chewbacuh cuts baseline latency roughly in half compared with LiL-Beastly.

LiL-Beastly carries the heavier model

The 14B lane is slower but stable, making it a deliberate choice rather than a default choice.

Both lanes saturate CPU early

Throughput stays nearly flat while latency rises, so operator UX needs queue expectations.

Keep reading