New v0.12.0 — deployment proof packets, RTSM backend fit, runtime safety, rollout evidence

Prove a robot policy is safe to deploy before it touches production hardware.

Tether (formerly Reflex) is the deployment-proof CLI for vision-language-action models. Export with parity, serve under a latency budget, replay real traces, enforce ActionGuard safety, and produce the evidence packet a robotics team needs before putting a policy on hardware.

$ curl -fsSL https://fastcrest.com/install | sh && tether chat
Don't trust pipes-to-shell? View the installer source first — 177-line Bash bootstrap; the installer itself does not phone home. Tether is BSL 1.1 with documented, configurable runtime telemetry.
v0.12.0 · Apache 2.0 in 2030 · Python ≥ 3.10

Deployment Proof

A pass/hold packet for robot releases.

Run Tether against the model, target hardware, replay session, and safety config you actually plan to ship. The output is a concrete deployability record, not a demo screenshot.

Optimize → Prove → Release → Comply

Four surfaces, one deployability system.

The source-available Tether engine creates local evidence. FastCrest Cloud runs proof jobs on rented GPUs. Fleet Release moves approved artifacts through robot cohorts. FastCrest Comply turns the evidence into a shared-auth conformity workspace.

How it works

From a HuggingFace model to a robot, in four steps.

tether go --model <hf_id> runs all four. Each step writes a verifiable artifact and refuses to ship if its check fails — bad exports never reach a robot.

1. Pull tether models pull from HuggingFace 2. Export torch → ONNX cos = +1.0 verified 3. Serve FastAPI + ORT-TRT /act + composable wedges 4. Act edge GPU → robot 10–50 ms / chunk

What it looks like

Talk to your robot fleet in plain English.

tether chat wraps the entire CLI surface in a natural-language agent. 100 calls/day free, no signup, no API key.

Real tether chat session — agent calls list_models, reads the registry, picks the model that fits your hardware, explains why. download .cast

Composable wedges

Every flag is opt-in. Compose only what you need.

14 runtime wedges layer on top of tether serve — safety, observability, optimization, transport. Enable only the checks your deployment needs, then export the evidence they produce.

Numbers

Verifiable claims, not vibes.

All three reproducible with one command on Modal. See the parity ledger and changelog for full provenance.

faster than monolithic ONNX (decomposed pi0.5 on Jetson AGX Orin)
5.55×
TensorRT FP16 vs ORT-CUDA on A10G (SmolVLA monolithic)
cos=+1.000000
numerical parity to PyTorch on the four verified major open VLA families

vs other tools

Where Tether fits.

Tether is deliberately narrow. Here's the honest read.

Tether Triton HF Endpoints Raw ONNX
Edge GPU deployment design center cloud-first cloud-only DIY
VLA-specific export (pi0 / pi0.5 / SmolVLA / GR00T; registry paths for OpenVLA / DreamZero) built-in, verified core no no manual, error-prone
Verified machine-precision parity automatic DIY DIY DIY
Decomposed pi0.5 (9× speedup) one flag no no ~weeks of work
Setup time 30 seconds days minutes 1–3 weeks
Multi-tenant cloud serving at scale not the design battle-tested managed DIY

Honest details: vs other tools →

Common questions

FAQ

Why was Reflex renamed to Tether?
One brand, one name. The OSS deploy CLI is now Tether; the hosted optimize-and-verify SaaS is FastCrest Cloud; the regulated-AI evidence bundle is FastCrest Comply. The package on PyPI is fastcrest-tether (the bare name tether is reserved on PyPI), the GitHub repo is github.com/FastCrest/tether, and every CLI command is still tether .... Old reflex URLs 301-redirect for the foreseeable future; the legacy reflex-vla package on PyPI stays pinned at 0.11.x — new releases ship as fastcrest-tether.
Why not just use Triton?
Triton is excellent for multi-tenant cloud inference at scale — many models, many services, datacenter GPUs, an ML platform team. Tether is for the opposite: one model, one robot, one process, edge GPU, one developer. Tether ships VLA-specific features Triton doesn't (decomposed pi0.5, A2C2, ActionGuard with URDF, episode-aware policy routing). The two compose — you can tether export and drop the result into Triton if you want both. Full comparison →
Does it work without a GPU?
Yes — install with pip install 'fastcrest-tether[serve,onnx]'. The CPU path runs the four verified major open VLA families at machine-precision parity to PyTorch; SmolVLA is the only one fast enough for real-time control on CPU. tether chat works with no install at all once the package is installed.
What does BSL 1.1 mean for me?
Tether (formerly Reflex) is source-available under BSL 1.1 — same license HashiCorp, MongoDB, and Sentry use. Free for personal, academic, and commercial use, including embedding in your own product. The only restriction is offering Tether itself as a competing hosted service. Auto-converts to Apache 2.0 four years after each release. License details →
Does it work on RTX 5090 / Blackwell?
ONNX Runtime 1.25.1+ includes Blackwell kernels, but RTX 50-series and B200/GB200 deployments still need a smoke pass before production use. Run tether doctor and the local serve smoke on the target GPU before promoting; use non-Blackwell cloud GPUs for fallback proof jobs when needed.
Can I use this in a commercial product?
Yes — BSL 1.1 explicitly permits commercial use, including embedding in proprietary products you ship to your own customers. The only restricted case is offering Tether itself as a competing hosted service. Most legitimate use cases (deploying to your own robots, your own labs, your own customers) are clearly in the free bucket.
How does this compare to NVIDIA's GR00T runtime?
Tether is the only open-source one-command deploy path for GR00T, as far as we know. NVIDIA's runtime is closed-source and locked to their hardware. Tether supports GR00T alongside pi0 / pi0.5 / SmolVLA — multi-vendor, source-available, and works on Jetson Orin (not just Thor).
What's the Pro tier?
The source-available CLI stays free for local export, serve, replay, and evidence generation. Paid offerings are for repeated team proof runs, hosted GPU verification, compliance workspaces, and continuous self-distillation. Pricing details →

Get in touch

Deploying a VLA? Send the model family, target hardware, replay session, and the proof you need. For quick questions, the Discord is usually faster.