A guide to building long-term compounding knowledge infrastructure. See details on GitHub .

Opik

Opik is an open-source LLM evaluation and observability platform that helps teams build, evaluate and optimize LLM applications.

Opik is an open-source platform developed by Comet for evaluating, monitoring and optimizing LLM-powered applications. It provides tracing, evaluation pipelines and dashboards to improve model quality and production observability.

Key features

  • End-to-end tracing: captures LLM calls, conversation context and agent activity at scale.
  • Advanced evaluation: includes LLM-as-a-judge metrics, dataset-driven evaluations and CI integrations.
  • Production monitoring & rules: online evaluation rules, feedback scoring and Guardrails for production reliability.

Use cases

  • Evaluating RAG chatbots and dialog systems during development and regression testing.
  • Tracing and optimizing multi-step agents and code-assistant workflows.
  • Monitoring token usage, response quality and anomalies in production with fast investigation tools.

Technical notes

  • SDKs & integrations: Python and TypeScript SDKs with integrations for LangChain, LlamaIndex, Autogen and others.
  • Deployments: supports Comet.com cloud or self-hosted deployment (Docker Compose / Kubernetes) with example scripts.
  • UI & automation: built-in dashboards, Prompt Playground, evaluation rules and Agent Optimizer components.

Comments

Opik
Resource Info
Author Comet
Added Date 2025-09-27
Tags
OSS Observation Evaluation