POST
/
api
/
v1
/
evaluate
cURL
curl --request POST \
  --url https://labs.tacitintelligence.co/api/v1/evaluate \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '
{
  "collection_slug": "<string>",
  "scenario_slug": "<string>",
  "messages": [
    {
      "content": "<string>",
      "tool_calls": [
        {
          "name": "<string>",
          "input": {}
        }
      ]
    }
  ],
  "normalization_key": "<string>",
  "normalization_windows": [
    "<string>"
  ],
  "reward_config": {
    "quality_weights": {
      "success_metrics": 0.5,
      "failure_metrics": 0.5,
      "best_practices": 0.5,
      "rubrics": 0.5,
      "discovery": 0.5,
      "output_similarity": 0.5,
      "decision_match": 0.5
    },
    "skip_safety_gate": true
  }
}
'
{
  "reward": 0.5,
  "scoring_version": "<string>",
  "scores": {},
  "score_breakdown": {},
  "normalized": {},
  "metadata": {
    "scenario_name": "<string>",
    "evaluation_time_ms": 123
  }
}

Authorizations

Authorization
string
header
required

API key obtained from Labs Portal

Body

application/json
collection_slug
string
required

The collection slug containing the scenario

scenario_slug
string
required

The scenario slug to evaluate against

messages
object[]
required

The conversation history to evaluate

normalization_key
string

Optional key used to group evaluations in scoring logs (e.g. training run ID, experiment name)

normalization_windows
string[]

Windows for z-normalization (e.g. "7d", "30d", "100n"). Requires normalization_key. Max 5 windows.

Maximum array length: 5

Window string: number + unit (d=days, w=weeks, m=months, n=last N evaluations)

Pattern: ^\d+[dwmn]$
reward_config
object

Reward configuration. Controls which scorers run and their weights. Omitted quality weight keys default to 0, so only specified scorers execute.

Response

Success

reward
number
required

Reward score (0 to 1)

Required range: 0 <= x <= 1
scoring_version
string
required

Version of the scoring algorithm used

scores
object

Optional breakdown of individual score dimensions. When omitted, only the combined reward is provided.

score_breakdown
object

Textual reasoning per score dimension. Only included for subscriptions with full score access.

normalized
object

Z-normalized scores per requested time window. Only present when normalization_key and normalization_windows are provided.

metadata
object

Additional evaluation metadata