Reflex Reflex

Take a robot policy off the training cluster, onto a robot.

Reflex is the deployment layer for vision-language-action models. Take any pi0, pi0.5, SmolVLA, or GR00T checkpoint and run it on a Jetson Orin or desktop GPU in one command. Verified at machine-precision parity to PyTorch. Or just tell the chat agent to do it for you.

$ curl -fsSL https://fastcrest.com/install | sh && reflex chat
v0.7 · BSL 1.1 · Python ≥ 3.10

What it looks like

$ reflex chat
connected: chat.fastcrest.com (model=gpt-5-mini)

you › deploy SmolVLA to my desktop GPU and start serving

  → list_targets({})
  → pull_model({"model": "smolvla-base"})     ↓ 900 MB from HuggingFace
  → export_model({"model": "smolvla-base", "target": "desktop"})
  → serve_model({"export_dir": "./reflex_export"})

SmolVLA is exported and serving at http://localhost:8000.
Latency: ~12 ms/call on your GPU. Try:
  curl -X POST http://localhost:8000/act ...

Multi-embodiment

pi0, pi0.5, SmolVLA, GR00T — all four major open VLA families. ONNX export verified at cos = +1.000000 against the PyTorch reference.

Edge-first

Jetson Orin Nano (8 GB) → AGX Orin → Thor → desktop NVIDIA. Hardware probe picks the right variant; export targets the right precision.

SnapFlow distillation

First open-source SnapFlow reproduction. Distill any pi0 / pi0.5 to a 1-step student that beats its 10-step teacher on LIBERO.

Production runtime

CUDA graphs, continuous batching, action-similarity fast-path, A2C2 correction, record-replay traces, real-time chunking — composable wedges on a single FastAPI server.

faster than monolithic ONNX (decomposed pi0.5)
5.55×
TensorRT FP16 vs ORT-CUDA on A10G
cos=+1.0
numerical parity to PyTorch on all 4 VLAs