Tool Calls
Many scenarios include tools that models can invoke. Tools represent real-world actions like creating documents or making decisions.
Artifacts
Artifacts are documents or structured outputs the model creates:
{
"type": "artifact",
"name": "medical_assessment",
"input": {
"condition": "lumbar_strain",
"severity": "moderate",
"work_restrictions": ["no_heavy_lifting", "seated_work_only"],
"follow_up_date": "2024-02-15"
}
}
Examples: medical assessments, case notes, reports, recommendations
Decisions
Decisions are choices from predefined options:
{
"type": "decision",
"name": "claim_status",
"input": {
"value": "approved"
}
}
Examples: approve/deny decisions, priority levels, category assignments
When you create an episode, available tools are returned:
{
"tools": [
{
"id": "medical_assessment",
"type": "artifact",
"name": "Medical Assessment",
"description": "Create a medical assessment for the claimant",
"mode": "json_schema",
"json_schema": {
"type": "object",
"properties": {
"condition": { "type": "string" },
"severity": { "enum": ["mild", "moderate", "severe"] },
"work_restrictions": {
"type": "array",
"items": { "type": "string" }
}
},
"required": ["condition", "severity"]
}
},
{
"id": "claim_status",
"type": "decision",
"name": "Claim Status",
"mode": "options",
"options": ["approved", "denied", "pending_info"]
}
]
}
Tool calls are attached to the assistant message that invokes them:
{
"messages": [
{
"role": "assistant",
"content": "Based on our conversation, I'm creating an assessment.",
"tool_calls": [
{
"type": "artifact",
"name": "medical_assessment",
"input": {
"condition": "lumbar_strain",
"severity": "moderate"
}
}
]
}
]
}
See the Quickstart for full code examples.
Tool calls should be made after gathering sufficient information. Making a decision too early can result in low rewards.
Tool calls are graded as part of the reward calculation. The grading considers:
- Whether the tool call was appropriate for the scenario
- Whether the inputs match what the scenario expected
- The timing of the tool call relative to information gathered
Tool inputs are not strictly validated against their schemas at submission time. Instead, incorrect or inappropriate tool calls affect your reward score.