Overview
ReLE (chinese-llm-benchmark) is a community-maintained Chinese LLM evaluation and leaderboard project that provides fine-grained benchmarks across education, medical, finance, legal, reasoning, language understanding and multimodal tasks.
Key features
- Extensive benchmark suites and leaderboards, including a large badcase repository.
- Regular releases and changelogs, with tools for model selection and leaderboard viewing.
- Provides leaderboard data and visualization for easy analysis and debugging.
Use cases
- Model evaluation and selection for research and engineering teams focused on Chinese-language LLMs.
- Course material and reading lists for MLSys/LLM classes with Chinese benchmarks.
- Error analysis and badcase collection to improve model robustness.
Technical characteristics
- Maintained as GitHub Markdown; easy to update via PRs and community contributions.
- Includes leaderboards, downloadable data and badcase visualizations for rapid analysis.
- Some content integrates with a dedicated site (nonelinear.com) for online presentation.