Training Datasets

A training dataset is a collection of completed scenario rollouts, packaged as NDJSON for supervised fine-tuning. Each row is one capture session, rendered as a sequence of user and assistant messages with metadata describing how that rollout was produced. The model being trained takes the practitioner role. Operators interacting with simulated personas during scenario play generate the assistant messages; the simulated persona generates the user messages. Operator-driven moments like “deliberately stay silent” are surfaced as discrete learnable targets via canonical tokens (see Canonical Tokens below).

Lifecycle

Status/export behaviour
pending, preview, running, filtering202 Accepted with status payload. Poll /status until ready.
ready200 OK with Content-Type: application/x-ndjson. Body streams one row per line.
cancelled409 Conflict with status payload. No final NDJSON.
failed500 Internal Server Error with status payload including a stored error string.

Polling Pattern

/status is cheap and safe to call repeatedly. /export is the heavy NDJSON download and is only meaningful when status='ready'.
const pollUntilReady = async (datasetId: string) => {
  while (true) {
    const res = await fetch(
      `${process.env.LABS_URL}/api/v1/training-datasets/${datasetId}/status`,
      { headers: { Authorization: `Bearer ${process.env.LABS_KEY}` } },
    );
    const { status, row_count } = await res.json();
    if (status === 'ready') return;
    if (status === 'cancelled' || status === 'failed') {
      throw new Error(`Dataset ended in status=${status}`);
    }
    console.log(`Status=${status} rows=${row_count}, polling again in 30s`);
    await new Promise((r) => setTimeout(r, 30_000));
  }
};
Once ready, download the NDJSON in a single request:
curl -H "Authorization: Bearer $LABS_KEY" \
  -o dataset.jsonl \
  "$LABS_URL/api/v1/training-datasets/$ID/export"

NDJSON Row Shape

Each line of the export is a single JSON object representing one completed rollout:
type TrainingRow = {
  scenario_id: string;
  agent_template_id: string;
  messages: Array<{
    role: 'user' | 'assistant';
    content: string;
    thinking?: string; // present on assistant rows that emitted a <thinking> block
  }>;
  metadata: {
    temperature: number;
    rollout_index: number;
    turn_count: number; // count of assistant messages
    quality_score: number | null;
    complexity_score: number | null;
    ifd_score: number | null;
  };
};
A worked row:
{"scenario_id":"sc_marathon_build_001","agent_template_id":"agt_coach_v3","messages":[{"role":"user","content":"Hi Luke, here's my update for week 3. I hit all five runs and my long run felt strong."},{"role":"assistant","content":"That's exactly the response we wanted. Hold pace for week 4 - same volume, same intensities."},{"role":"user","content":"Week 4 update: missed Tuesday run, slept badly. Monday and Wednesday felt fine."},{"role":"assistant","content":"<no_response/>"}],"metadata":{"temperature":0.7,"rollout_index":0,"turn_count":2,"quality_score":0.84,"complexity_score":0.62,"ifd_score":null}}

Canonical Tokens

The platform uses self-closing XML-style tokens to mark moments that are not natural-language utterances but still carry training signal. The model learns to emit these tokens as discrete targets.
TokenRoleMeaning
<no_response/>assistantOperator deliberately chose silence at a reach-out moment. First-class learnable target, not an absence of data.
thinking fieldassistantRaw <thinking> block from the persona-runtime prompt, surfaced as a sibling field on the message. The platform does not interpret it.
More tokens will be added as additional operator behaviours become exportable. Treat the token list as additive; existing tokens will not change shape.

Why no-response is its own token

The pacing of a coaching trajectory is part of what is being trained. When an operator chooses to let a check-in pass without intervening, that choice itself is the lesson, not the absence of one. Encoding silence as <no_response/> rather than as missing data lets the operator-side model learn “the right answer here was silence” instead of inferring it from message gaps. The persona-runtime side handles the same signal differently: silence is merged into the next time-skip’s CONTEXT block, so the persona sees “you have not heard back; N days have passed” rather than a literal token. The training-export representation and the runtime representation are deliberately split for this reason.

Endpoints