2026-05-04v0.4.0MAJOR
Sandbox · multi-LLM playground
Test prompts on Claude Sonnet 4.6, GPT-5.5, Gemini 2 Pro, and Llama 4 side-by-side. Eval rubric scoring. Promote winners to prod.
- +New /sandbox page with side-by-side comparison
- +Auto eval rubric (5 criteria: format, coherence, accuracy, brevity, actionability)
- +Latency, tokens, cost tracked per provider
- +Promote-to-prod flow for harness library