RAIL Score: Responsible AI Evaluation

Evaluate LLM outputs across 8 responsible AI dimensions: fairness, safety, reliability, transparency, privacy, accountability, inclusivity, and user impact. Each dimension is scored 0-10 with a confidence estimate.

Python SDK: pip install rail-score-sdk | Evaluate metric: evaluate.load('responsible-ai-labs/rail_score')

Get a free API key at responsibleailabs.ai | SDK Docs

Mode

Basic: fast ML classifier (1 credit). Deep: LLM-as-judge with explanations (3 credits).

Domain

Content domain hint for scoring calibration.

Use Case

Use case context for evaluation.

Dimensions (leave empty for all 8)

Select specific dimensions to evaluate, or leave empty to score all.

Return detected issue tags per dimension (works best with deep mode).

Return improvement suggestions for low-scoring dimensions.

Examples
LLM Response Prompt / Context (optional) RAIL API Key Mode Domain Use Case Dimensions (leave empty for all 8) Custom Weights (optional) Include Issues Include Suggestions