Frequently Asked Questions
General
What is Labs?
What is Labs?
How is Labs different from RLHF?
How is Labs different from RLHF?
| Aspect | RLHF | Labs |
|---|---|---|
| Reward source | Human annotators | Automated evaluation |
| Scalability | Limited by labeler availability | Unlimited |
| Cost | High (human labor) | Lower (compute only) |
| Consistency | Variable (human disagreement) | Deterministic |
| Scoring | Based on human preference | Calibrated, verifiable |
What domains do you support?
What domains do you support?
- Case Management: Insurance claims, workers’ compensation
- Healthcare: Patient consultations, symptom investigation
- Finance: Risk assessment, compliance
Can I use Labs with my own scenarios?
Can I use Labs with my own scenarios?
API & Integration
What's the maximum batch size?
What's the maximum batch size?
/api/v1/batch/evaluate endpoint accepts up to 100 items per request. For larger batches, split into multiple requests.What are the rate limits?
What are the rate limits?
- Standard: 60 requests/minute
- Premium: 300 requests/minute
- Enterprise: Custom limits
X-RateLimit-* headers for current limits.Do episodes expire?
Do episodes expire?
Can I replay an episode?
Can I replay an episode?
What languages are supported?
What languages are supported?
Training
What training frameworks are supported?
What training frameworks are supported?
- Hugging Face TRL (GRPO)
- OpenRLHF
- NVIDIA NeMo-Aligner
Should I use dense or sparse rewards?
Should I use dense or sparse rewards?
How many episodes do I need for training?
How many episodes do I need for training?
- Model size (larger models need more data)
- Scenario complexity
- Desired performance level
Can I train on multiple scenarios simultaneously?
Can I train on multiple scenarios simultaneously?
How do I know if my model is learning?
How do I know if my model is learning?
- Average reward per episode (should increase)
- Episode completion rate (should increase)
- Turns to completion (should decrease)
- Low reward rate (should decrease)
Rewards & Evaluation
What's the reward range?
What's the reward range?
- Higher rewards (closer to 1): Good progress
- Lower rewards (closer to 0): Neutral or incorrect actions
Why did I get a low reward?
Why did I get a low reward?
- Harmful recommendations
- Incorrect conclusions
- Off-topic responses
- Tool misuse
Are rewards deterministic?
Are rewards deterministic?
What's the compare endpoint for?
What's the compare endpoint for?
/api/v1/compare endpoint is for preference-based training like DPO. It compares two responses and tells you which is better, plus the margin of difference.Data & Privacy
Do you store my model's outputs?
Do you store my model's outputs?
Is my training data safe?
Is my training data safe?
- All communication is over HTTPS
- API keys are hashed, not stored in plain text
- We don’t share data between organizations
- SOC 2 compliance (in progress)
Can I export my training history?
Can I export my training history?
Billing & Plans
How is usage billed?
How is usage billed?
- Number of API calls
- Number of episodes created
- Batch items evaluated
Is there a free tier?
Is there a free tier?
Can I upgrade my plan?
Can I upgrade my plan?