FAQ

General

What is Labs?

Labs is a remote reward server for RL training. It provides automated, calibrated reward signals for scenario-based evaluation. Unlike traditional RLHF which requires human labelers, Labs uses scenario-based evaluation with calibrated scoring to provide scalable training signal.

How is Labs different from RLHF?

Aspect	RLHF	Labs
Reward source	Human annotators	Automated evaluation
Scalability	Limited by labeler availability	Unlimited
Cost	High (human labor)	Lower (compute only)
Consistency	Variable (human disagreement)	Deterministic
Scoring	Based on human preference	Calibrated, verifiable

What domains do you support?

Current collections include:

Case Management: Insurance claims, workers’ compensation
Healthcare: Patient consultations, symptom investigation
Finance: Risk assessment, compliance

We’re continuously adding new domains. Contact us if you have specific needs.

Can I use Labs with my own scenarios?

Currently, Labs uses our curated scenario library. Custom scenario support is on our roadmap. Contact us if you’re interested in this capability.

API & Integration

What's the maximum batch size?

The /api/v1/batch/evaluate endpoint accepts up to 100 items per request. For larger batches, split into multiple requests.

What are the rate limits?

Rate limits depend on your subscription:

Standard: 60 requests/minute
Premium: 300 requests/minute
Enterprise: Custom limits

Check X-RateLimit-* headers for current limits.

Do episodes expire?

Yes, episodes expire after 24 hours of inactivity. For long-running training jobs, create new episodes as needed rather than keeping old ones open.

Can I replay an episode?

No, each episode is unique. Simulated responses are generated fresh each time. This prevents memorization and ensures fair evaluation.

What languages are supported?

Currently, all scenarios are in English. Multi-language support is planned for future releases.

Training

What training frameworks are supported?

Labs works with any framework that can make HTTP requests. We have specific guides for:

Hugging Face TRL (GRPO)
OpenRLHF
NVIDIA NeMo-Aligner

See our Integration Guides for details.

Should I use dense or sparse rewards?

Start with dense rewards (per-turn). They provide more signal and are easier to learn from. Once your model shows improvement, you can experiment with sparse rewards (cumulative at episode end) for harder scenarios.

How many episodes do I need for training?

It depends on:

Model size (larger models need more data)
Scenario complexity
Desired performance level

A rough guideline: start with 1,000-10,000 episodes and scale based on learning curves.

Can I train on multiple scenarios simultaneously?

Yes, and it’s recommended! Training on a mix of scenarios improves generalization and prevents overfitting to specific patterns.

How do I know if my model is learning?

Track these metrics:

Average reward per episode (should increase)
Episode completion rate (should increase)
Turns to completion (should decrease)
Low reward rate (should decrease)

Rewards & Evaluation

What's the reward range?

All rewards are normalized to [0, 1]:

Higher rewards (closer to 1): Good progress
Lower rewards (closer to 0): Neutral or incorrect actions

Why did I get a low reward?

Low rewards (close to 0) indicate problematic behavior:

Harmful recommendations
Incorrect conclusions
Off-topic responses
Tool misuse

Log your episodes to debug which actions resulted in low scores.

Are rewards deterministic?

Yes, for the same scenario state and action, the reward is deterministic. However, simulated persona responses may vary slightly to prevent gaming.

What's the compare endpoint for?

The /api/v1/compare endpoint is for preference-based training like DPO. It compares two responses and tells you which is better, plus the margin of difference.

Data & Privacy

Do you store my model's outputs?

We log API requests for debugging and abuse prevention. Logs are retained for 30 days and not used for training our own models. Enterprise plans can opt out of logging.

Is my training data safe?

Yes:

All communication is over HTTPS
API keys are hashed, not stored in plain text
We don’t share data between organizations
SOC 2 compliance (in progress)

Can I export my training history?

Currently, we don’t provide bulk export of historical episodes. For auditability, we recommend logging episodes on your side during training.

Billing & Plans

How is usage billed?

Usage is tracked by:

Number of API calls
Number of episodes created
Batch items evaluated

Check your usage in the Labs Portal dashboard.

Is there a free tier?

Yes, new accounts get a free trial with limited API calls. Contact us for trial extensions for research purposes.

Can I upgrade my plan?

Yes, contact us through the Labs Portal to discuss plan upgrades and enterprise options.

Overview

Concepts

Integration

Best Practices

Troubleshooting

General

API & Integration

Training

Rewards & Evaluation

Data & Privacy

Billing & Plans

Still Have Questions?

​General

​API & Integration

​Training

​Rewards & Evaluation

​Data & Privacy

​Billing & Plans

​Still Have Questions?

General

API & Integration

Training

Rewards & Evaluation

Data & Privacy

Billing & Plans

Still Have Questions?