Detailed Introduction
Rhesis is an open-source testing platform and SDK for large language model (LLM) and agentic applications. Teams describe what their app should and should not do in plain language; Rhesis then generates hundreds of single-turn and multi-turn test scenarios (including adversarial prompts), runs them against the target application, and highlights failures such as hallucinations, data leakage, or policy violations. The platform includes a review UI, SDK, and CI integrations to help cross-functional teams find and fix issues before production.
Main Features
- AI-driven test generation: produce broad coverage of adversarial and edge-case inputs for single-turn and multi-turn flows.
- LLM-based evaluation: automatically score outputs against requirements using LLM evaluators.
- Collaboration workflow: comments, issues, and review tools so non-engineers can define requirements and review results.
- Flexible deployment: hosted service or self-hosted Docker stack, with CI/CD friendly interfaces.
Use Cases
- Pre-production testing for chatbots, RAG systems, and agentic applications to catch regressions and safety issues.
- Integrating automated tests into CI pipelines to prevent unsafe model versions from reaching production.
- Compliance and product teams validating model behavior against policy requirements at scale.
Technical Features
- Supports single-turn and multi-turn (Penelope) testing to simulate realistic conversation chains.
- Built-in metrics library (RAGAS, DeepEval, etc.) and visual reports for diagnostics.
- SDK and API support for IDE-based workflows and scripted test automation.
- Open-source with modular architecture and community contributions.