Reflex is the deployment layer for vision-language-action (VLA) models. Take any pi0, pi0.5, SmolVLA, or GR00T checkpoint and run it on your Jetson — or just tell the chat agent to do it for you. One CLI, no config files, no training-loop boilerplate.
$ reflex chat
connected: chat.fastcrest.com (model=gpt-5-mini)
you › deploy SmolVLA to my desktop GPU and start serving
→ list_targets({})
→ pull_model({"model": "smolvla-base"}) ↓ 900 MB from HuggingFace
→ export_model({"model": "smolvla-base", "target": "desktop"})
→ serve_model({"export_dir": "./reflex_export"})
SmolVLA is exported and serving at http://localhost:8000.
Latency: ~12 ms/call on your GPU. Try:
curl -X POST http://localhost:8000/act ...
pi0, pi0.5, SmolVLA, GR00T — all four major open VLA families. ONNX export verified at cos = +1.0000000 against PyTorch reference.
Jetson Orin Nano (8 GB) → AGX Orin → Thor → desktop NVIDIA. Hardware probe picks the right variant; export targets the right precision.
First open-source SnapFlow reproduction. Distill any pi0/pi0.5 to a 1-step student that beats its 10-step teacher on LIBERO.
CUDA graphs, continuous batching, action-similarity fast-path, A2C2 correction, record-replay traces, real-time chunking — composable wedges on a single FastAPI server.