Talk to your robot fleet.
In plain English.

Reflex is the deployment layer for vision-language-action (VLA) models. Take any pi0, pi0.5, SmolVLA, or GR00T checkpoint and run it on your Jetson — or just tell the chat agent to do it for you. One CLI, no config files, no training-loop boilerplate.

$ pip install reflex-vla && reflex chat
v0.3.2 Apache 2.0 Python ≥ 3.10

What it looks like

$ reflex chat
connected: chat.fastcrest.com (model=gpt-5-mini)

you › deploy SmolVLA to my desktop GPU and start serving

  → list_targets({})
  → pull_model({"model": "smolvla-base"})     ↓ 900 MB from HuggingFace
  → export_model({"model": "smolvla-base", "target": "desktop"})
  → serve_model({"export_dir": "./reflex_export"})

SmolVLA is exported and serving at http://localhost:8000.
Latency: ~12 ms/call on your GPU. Try:
  curl -X POST http://localhost:8000/act ...

Multi-embodiment

pi0, pi0.5, SmolVLA, GR00T — all four major open VLA families. ONNX export verified at cos = +1.0000000 against PyTorch reference.

Edge-first

Jetson Orin Nano (8 GB) → AGX Orin → Thor → desktop NVIDIA. Hardware probe picks the right variant; export targets the right precision.

SnapFlow distillation

First open-source SnapFlow reproduction. Distill any pi0/pi0.5 to a 1-step student that beats its 10-step teacher on LIBERO.

Production runtime

CUDA graphs, continuous batching, action-similarity fast-path, A2C2 correction, record-replay traces, real-time chunking — composable wedges on a single FastAPI server.

faster than monolithic ONNX (decomposed pi0.5)
5
VLA families ship-ready
5
edge hardware tiers