Training Datasets
A training dataset is a collection of completed scenario rollouts, packaged as NDJSON for supervised fine-tuning. Each row is one capture session, rendered as a sequence ofuser and assistant messages with metadata describing how that rollout was produced.
The model being trained takes the practitioner role. Operators interacting with simulated personas during scenario play generate the assistant messages; the simulated persona generates the user messages. Operator-driven moments like “deliberately stay silent” are surfaced as discrete learnable targets via canonical tokens (see Canonical Tokens below).
Lifecycle
| Status | /export behaviour |
|---|---|
pending, preview, running, filtering | 202 Accepted with status payload. Poll /status until ready. |
ready | 200 OK with Content-Type: application/x-ndjson. Body streams one row per line. |
cancelled | 409 Conflict with status payload. No final NDJSON. |
failed | 500 Internal Server Error with status payload including a stored error string. |
Polling Pattern
/status is cheap and safe to call repeatedly. /export is the heavy NDJSON download and is only meaningful when status='ready'.
ready, download the NDJSON in a single request:
NDJSON Row Shape
Each line of the export is a single JSON object representing one completed rollout:Canonical Tokens
The platform uses self-closing XML-style tokens to mark moments that are not natural-language utterances but still carry training signal. The model learns to emit these tokens as discrete targets.| Token | Role | Meaning |
|---|---|---|
<no_response/> | assistant | Operator deliberately chose silence at a reach-out moment. First-class learnable target, not an absence of data. |
thinking field | assistant | Raw <thinking> block from the persona-runtime prompt, surfaced as a sibling field on the message. The platform does not interpret it. |
Why no-response is its own token
The pacing of a coaching trajectory is part of what is being trained. When an operator chooses to let a check-in pass without intervening, that choice itself is the lesson, not the absence of one. Encoding silence as<no_response/> rather than as missing data lets the operator-side model learn “the right answer here was silence” instead of inferring it from message gaps. The persona-runtime side handles the same signal differently: silence is merged into the next time-skip’s CONTEXT block, so the persona sees “you have not heard back; N days have passed” rather than a literal token. The training-export representation and the runtime representation are deliberately split for this reason.