Rhesis

An open-source testing platform and SDK for LLM and agentic applications that generates test scenarios and evaluates model outputs.

Rhesis · Since 2024-10-09

Loading score...

GitHub Website

Detailed Introduction

Rhesis is an open-source testing platform and SDK for large language model (LLM) and agentic applications. Teams describe what their app should and should not do in plain language; Rhesis then generates hundreds of single-turn and multi-turn test scenarios (including adversarial prompts), runs them against the target application, and highlights failures such as hallucinations, data leakage, or policy violations. The platform includes a review UI, SDK, and CI integrations to help cross-functional teams find and fix issues before production.

Main Features

AI-driven test generation: produce broad coverage of adversarial and edge-case inputs for single-turn and multi-turn flows.
LLM-based evaluation: automatically score outputs against requirements using LLM evaluators.
Collaboration workflow: comments, issues, and review tools so non-engineers can define requirements and review results.
Flexible deployment: hosted service or self-hosted Docker stack, with CI/CD friendly interfaces.

Use Cases

Pre-production testing for chatbots, RAG systems, and agentic applications to catch regressions and safety issues.
Integrating automated tests into CI pipelines to prevent unsafe model versions from reaching production.
Compliance and product teams validating model behavior against policy requirements at scale.

Technical Features

Supports single-turn and multi-turn (Penelope) testing to simulate realistic conversation chains.
Built-in metrics library (RAGAS, DeepEval, etc.) and visual reports for diagnostics.
SDK and API support for IDE-based workflows and scripted test automation.
Open-source with modular architecture and community contributions.

Rhesis

Detailed Introduction

Main Features

Use Cases

Technical Features

Score Breakdown

Related Resources

Basic Memory

CocoIndex

mgrep