Frequently Asked Questions

General

Labs is a remote reward server for RL training. It provides automated, calibrated reward signals for scenario-based evaluation. Unlike traditional RLHF which requires human labelers, Labs uses scenario-based evaluation with calibrated scoring to provide scalable training signal.
AspectRLHFLabs
Reward sourceHuman annotatorsAutomated evaluation
ScalabilityLimited by labeler availabilityUnlimited
CostHigh (human labor)Lower (compute only)
ConsistencyVariable (human disagreement)Deterministic
ScoringBased on human preferenceCalibrated, verifiable
Current collections include:
  • Case Management: Insurance claims, workers’ compensation
  • Healthcare: Patient consultations, symptom investigation
  • Finance: Risk assessment, compliance
We’re continuously adding new domains. Contact us if you have specific needs.
Currently, Labs uses our curated scenario library. Custom scenario support is on our roadmap. Contact us if you’re interested in this capability.

API & Integration

The /api/v1/batch/evaluate endpoint accepts up to 100 items per request. For larger batches, split into multiple requests.
Rate limits depend on your subscription:
  • Standard: 60 requests/minute
  • Premium: 300 requests/minute
  • Enterprise: Custom limits
Check X-RateLimit-* headers for current limits.
Yes, episodes expire after 24 hours of inactivity. For long-running training jobs, create new episodes as needed rather than keeping old ones open.
No, each episode is unique. Simulated responses are generated fresh each time. This prevents memorization and ensures fair evaluation.
Currently, all scenarios are in English. Multi-language support is planned for future releases.

Training

Labs works with any framework that can make HTTP requests. We have specific guides for:
  • Hugging Face TRL (GRPO)
  • OpenRLHF
  • NVIDIA NeMo-Aligner
See our Integration Guides for details.
Start with dense rewards (per-turn). They provide more signal and are easier to learn from. Once your model shows improvement, you can experiment with sparse rewards (cumulative at episode end) for harder scenarios.
It depends on:
  • Model size (larger models need more data)
  • Scenario complexity
  • Desired performance level
A rough guideline: start with 1,000-10,000 episodes and scale based on learning curves.
Yes, and it’s recommended! Training on a mix of scenarios improves generalization and prevents overfitting to specific patterns.
Track these metrics:
  • Average reward per episode (should increase)
  • Episode completion rate (should increase)
  • Turns to completion (should decrease)
  • Low reward rate (should decrease)

Rewards & Evaluation

All rewards are normalized to [0, 1]:
  • Higher rewards (closer to 1): Good progress
  • Lower rewards (closer to 0): Neutral or incorrect actions
Low rewards (close to 0) indicate problematic behavior:
  • Harmful recommendations
  • Incorrect conclusions
  • Off-topic responses
  • Tool misuse
Log your episodes to debug which actions resulted in low scores.
Yes, for the same scenario state and action, the reward is deterministic. However, simulated persona responses may vary slightly to prevent gaming.
The /api/v1/compare endpoint is for preference-based training like DPO. It compares two responses and tells you which is better, plus the margin of difference.

Data & Privacy

We log API requests for debugging and abuse prevention. Logs are retained for 30 days and not used for training our own models. Enterprise plans can opt out of logging.
Yes:
  • All communication is over HTTPS
  • API keys are hashed, not stored in plain text
  • We don’t share data between organizations
  • SOC 2 compliance (in progress)
Currently, we don’t provide bulk export of historical episodes. For auditability, we recommend logging episodes on your side during training.

Billing & Plans

Usage is tracked by:
  • Number of API calls
  • Number of episodes created
  • Batch items evaluated
Check your usage in the Labs Portal dashboard.
Yes, new accounts get a free trial with limited API calls. Contact us for trial extensions for research purposes.
Yes, contact us through the Labs Portal to discuss plan upgrades and enterprise options.

Still Have Questions?

Contact us at labs@tacitintelligence.co or through the Labs Portal support chat.