Ragas

Ragas is an open-source toolkit for evaluating and optimizing LLM applications, offering objective metrics, test data generation, and production feedback loops.

ExplodingGradients · Since 2023-05-08

Loading score...

GitHub Website

Ragas is an open-source toolkit designed to evaluate and optimize LLM applications. It provides objective metrics, automated test-data generation, and production-aligned feedback loops to help teams measure and improve model behavior in real-world scenarios.

Key features

Objective metrics: combine LLM-driven and traditional metrics for fine-grained evaluation.
Test data generation: automatically create diverse, production-aligned test sets.
Integrations: works with popular LLM frameworks (e.g. LangChain) and observability tools for easy production adoption.

Use cases

Evaluation & regression testing: automate checks for model changes and regressions.
Quality engineering: generate test datasets to surface real-world issues early.
Continuous improvement: close the loop using production data to refine models.

Technical notes

Implementation: primarily Python, with examples and extension points.
Extensible metrics: supports pluggable evaluators and LLM-based scorers (AspectCritic).
Deployment: provides CLI and library APIs suitable for local installs and CI integration.

Core Content

Core Content

Technology

Technology

More

More

AI Infrastructure

AI Infrastructure

Explore

Explore

Connect

Connect

Quick Links

Quick Links

LinkedIn

LinkedIn

Follow on X

Follow on X

Ragas

Key features

Use cases

Technical notes

Score Breakdown

Related Resources

Agenta

ReLE Chinese LLM Benchmark

DeepEval