Prove a robot policy is safe to deploy before it touches production hardware.
Tether (formerly Reflex) is the deployment-proof CLI for vision-language-action models. Export with parity, serve under a latency budget, replay real traces, enforce ActionGuard safety, and produce the evidence packet a robotics team needs before putting a policy on hardware.
Deployment Proof
A pass/hold packet for robot releases.
Run Tether against the model, target hardware, replay session, and safety config you actually plan to ship. The output is a concrete deployability record, not a demo screenshot.
Proof packet
Artifact hashes, PyTorch parity, p50/p95/p99 latency, deadline misses, stale-action windows, and target hardware details in one signed receipt.
Rollout confidence
Record-replay traces, policy-version evidence, shadow/canary gates, warm-swap checks, and rollback signals before promotion.
Audit evidence
ActionGuard summaries, safety violations, SBOM, vulnerability handling, and compliance gaps packaged for technical review.
Optimize → Prove → Release → Comply
Four surfaces, one deployability system.
The source-available Tether engine creates local evidence. FastCrest Cloud runs proof jobs on rented GPUs. Fleet Release moves approved artifacts through robot cohorts. FastCrest Comply turns the evidence into a shared-auth conformity workspace.
Engine · open source
The CLI you just installed. Optimize and serve pi0 / pi0.5 / SmolVLA / GR00T policies on edge GPUs, with ActionGuard safety and a tamper-evident audit log. Free, BSL 1.1.
Cloud · hosted
Upload a model, target chip, replay session, and latency budget. Cloud GPUs return a verified edge artifact, signed receipt, and deployment proof packet.
Fleet Release · rollout
Move a verified artifact through staging, canary, and production robot cohorts with health gates, rollback, and release receipts.
Comply · EU evidence
Turn the audit log, signed cert, SBOM, and ActionGuard summary into a live compliance workspace for AI Act, CRA, and Machinery Regulation review.
How it works
From a HuggingFace model to a robot, in four steps.
tether go --model <hf_id> runs all four. Each step writes a verifiable artifact and refuses to ship if its check fails — bad exports never reach a robot.
What it looks like
Talk to your robot fleet in plain English.
tether chat wraps the entire CLI surface in a natural-language agent. 100 calls/day free, no signup, no API key.
tether chat session — agent calls list_models, reads the registry, picks the model that fits your hardware, explains why. download .cast
Composable wedges
Every flag is opt-in. Compose only what you need.
14 runtime wedges layer on top of tether serve — safety, observability, optimization, transport. Enable only the checks your deployment needs, then export the evidence they produce.
Multi-embodiment
pi0, pi0.5, SmolVLA, GR00T — all four major open VLA families. ONNX export verified at cos = +1.000000 against PyTorch.
Edge-first
Jetson Orin Nano (8 GB) → AGX Orin → Thor → desktop NVIDIA. Hardware probe picks the right variant; export targets the right precision.
SnapFlow distillation
First open-source SnapFlow reproduction. Distill any pi0 / pi0.5 to a 1-step student that beats its 10-step teacher (64% vs 56% on libero_object).
Production runtime
CUDA graphs, cost-weighted batching, A2C2 correction, record-replay traces, real-time chunking — composable wedges on a single FastAPI server.
Numbers
Verifiable claims, not vibes.
All three reproducible with one command on Modal. See the parity ledger and changelog for full provenance.
vs other tools
Where Tether fits.
Tether is deliberately narrow. Here's the honest read.
| Tether | Triton | HF Endpoints | Raw ONNX | |
|---|---|---|---|---|
| Edge GPU deployment | design center | cloud-first | cloud-only | DIY |
| VLA-specific export (pi0 / pi0.5 / SmolVLA / GR00T; registry paths for OpenVLA / DreamZero) | built-in, verified core | no | no | manual, error-prone |
| Verified machine-precision parity | automatic | DIY | DIY | DIY |
| Decomposed pi0.5 (9× speedup) | one flag | no | no | ~weeks of work |
| Setup time | 30 seconds | days | minutes | 1–3 weeks |
| Multi-tenant cloud serving at scale | not the design | battle-tested | managed | DIY |
Honest details: vs other tools →
Common questions
FAQ
Why was Reflex renamed to Tether?
fastcrest-tether (the bare name tether is reserved on PyPI), the GitHub repo is github.com/FastCrest/tether, and every CLI command is still tether .... Old reflex URLs 301-redirect for the foreseeable future; the legacy reflex-vla package on PyPI stays pinned at 0.11.x — new releases ship as fastcrest-tether.
Why not just use Triton?
tether export and drop the result into Triton if you want both. Full comparison →
Does it work without a GPU?
pip install 'fastcrest-tether[serve,onnx]'. The CPU path runs the four verified major open VLA families at machine-precision parity to PyTorch; SmolVLA is the only one fast enough for real-time control on CPU. tether chat works with no install at all once the package is installed.
What does BSL 1.1 mean for me?
Does it work on RTX 5090 / Blackwell?
tether doctor and the local serve smoke on the target GPU before promoting; use non-Blackwell cloud GPUs for fallback proof jobs when needed.
Can I use this in a commercial product?
How does this compare to NVIDIA's GR00T runtime?
What's the Pro tier?
Get in touch
Deploying a VLA? Send the model family, target hardware, replay session, and the proof you need. For quick questions, the Discord is usually faster.