Evaluate
Single-shot evaluation endpoint. Submit a complete conversation and receive a reward score.
Authorizations
API key obtained from Labs Portal
Body
The collection slug containing the scenario
The scenario slug to evaluate against
The conversation history to evaluate
Optional key used to group evaluations in scoring logs (e.g. training run ID, experiment name)
Windows for z-normalization (e.g. "7d", "30d", "100n"). Requires normalization_key. Max 5 windows.
5Window string: number + unit (d=days, w=weeks, m=months, n=last N evaluations)
^\d+[dwmn]$Reward configuration. Controls which scorers run and their weights. Omitted quality weight keys default to 0, so only specified scorers execute.
Response
Success
Reward score (0 to 1)
0 <= x <= 1Version of the scoring algorithm used
Optional breakdown of individual score dimensions. When omitted, only the combined reward is provided.
Textual reasoning per score dimension. Only included for subscriptions with full score access.
Z-normalized scores per requested time window. Only present when normalization_key and normalization_windows are provided.
Additional evaluation metadata