Reflex Reflex
New v0.7 ships TensorRT EP first-class — 5.55× speedup

Take a robot policy off the training cluster, onto a robot.

Reflex is the deployment layer for vision-language-action models. Take any pi0, pi0.5, SmolVLA, or GR00T checkpoint and run it on a Jetson Orin or desktop GPU in one command — verified at machine-precision parity to PyTorch. Or just tell the chat agent to do it for you.

$ curl -fsSL https://fastcrest.com/install | sh && reflex chat
Don't trust pipes-to-shell? View the installer source first — 181 lines, no telemetry, MIT.
v0.7 · Apache 2.0 in 2030 · Python ≥ 3.10

How it works

From a HuggingFace model to a robot, in four steps.

reflex go --model <hf_id> runs all four. Each step writes a verifiable artifact and refuses to ship if its check fails — bad exports never reach a robot.

1. Pull reflex models pull from HuggingFace 2. Export torch → ONNX cos = +1.0 verified 3. Serve FastAPI + ORT-TRT /act + composable wedges 4. Act edge GPU → robot 10–50 ms / chunk

What it looks like

Talk to your robot fleet in plain English.

reflex chat wraps the entire CLI surface in a natural-language agent. 100 calls/day free, no signup, no API key.

Real reflex chat session — real LLM, real tool calls, real benchmark data from the live registry. download .cast

Composable wedges

Every flag is opt-in. Compose only what you need.

14 runtime wedges layer on top of reflex serve — safety, observability, optimization, transport. Every response surfaces telemetry from the wedges you've enabled.

Numbers

Verifiable claims, not vibes.

All three reproducible with one command on Modal. See the parity ledger and changelog for full provenance.

faster than monolithic ONNX (decomposed pi0.5 on Jetson AGX Orin)
TensorRT FP16 vs ORT-CUDA on A10G (SmolVLA monolithic)
cos=+0
numerical parity to PyTorch on all 4 supported VLAs

vs other tools

Where Reflex fits.

Reflex is deliberately narrow. Here's the honest read.

Reflex Triton HF Endpoints Raw ONNX
Edge GPU deployment design center cloud-first cloud-only DIY
VLA-specific export (pi0 / pi0.5 / GR00T) built-in, validated no no manual, error-prone
Verified machine-precision parity automatic DIY DIY DIY
Decomposed pi0.5 (9× speedup) one flag no no ~weeks of work
Setup time 30 seconds days minutes 1–3 weeks
Multi-tenant cloud serving at scale not the design battle-tested managed DIY

Honest details: vs other tools →

Common questions

FAQ

Why not just use Triton?
Triton is excellent for multi-tenant cloud inference at scale — many models, many services, datacenter GPUs, an ML platform team. Reflex is for the opposite: one model, one robot, one process, edge GPU, one developer. Reflex ships VLA-specific features Triton doesn't (decomposed pi0.5, A2C2, ActionGuard with URDF, episode-aware policy routing). The two compose — you can reflex export and drop the result into Triton if you want both. Full comparison →
Does it work without a GPU?
Yes — install with pip install 'reflex-vla[serve,onnx]'. The CPU path runs all four supported VLAs at machine-precision parity to PyTorch; SmolVLA is the only one fast enough for real-time control on CPU. reflex chat works with no install at all once the package is installed.
What does BSL 1.1 mean for me?
Reflex is source-available under BSL 1.1 — same license HashiCorp, MongoDB, and Sentry use. Free for personal, academic, and commercial use, including embedding in your own product. The only restriction is offering Reflex itself as a competing hosted service. Auto-converts to Apache 2.0 four years after each release. License details →
Does it work on RTX 5090 / Blackwell?
Not yet. ORT's bundled cuBLAS / cuDNN don't ship sm_100 kernels, so reflex go segfaults at startup on Blackwell. Workaround: use reflex chat (no GPU needed), reflex doctor, and reflex models list. /act needs a non-Blackwell GPU temporarily — Modal A10G/A100 or RTX 4090. We're tracking ORT upstream for the fix.
Can I use this in a commercial product?
Yes — BSL 1.1 explicitly permits commercial use, including embedding in proprietary products you ship to your own customers. The only restricted case is offering Reflex itself as a competing hosted service. Most legitimate use cases (deploying to your own robots, your own labs, your own customers) are clearly in the free bucket.
How does this compare to NVIDIA's GR00T runtime?
Reflex is the only open-source one-command deploy path for GR00T, as far as we know. NVIDIA's runtime is closed-source and locked to their hardware. Reflex supports GR00T alongside pi0 / pi0.5 / SmolVLA — multi-vendor, source-available, and works on Jetson Orin (not just Thor).
What's the Pro tier?
Open-source Reflex covers everything except continuous self-distillation. Pro ($99/mo) adds an automated loop: collect production traffic → distill a customer-specific 1-step student every N hours → gate via a 9-check methodology → atomic warm-swap. Pricing details →