Introduction
DataHub is an open-source data discovery and metadata platform that originated from LinkedIn. It provides a real-time metadata graph, lineage, documentation and observability tools to help teams manage and discover data assets across the modern data stack.
Key features
- Real-time metadata graph connecting datasets, dashboards, pipelines and services.
- Wide set of connectors for warehouses, messaging systems and BI tools.
- Demo environments and a comprehensive docs site for quick evaluation.
- Active community and enterprise adoption.
Use cases
- Data catalog and self-serve analytics enabling analysts to find and use data.
- Governance and compliance via lineage, auditing and access controls.
- Central metadata layer integrating storage, compute and BI systems.
Technical characteristics
- Multi-language implementation (Java/TypeScript/Python) with modular architecture.
- Quickstart with Docker and Helm; production-ready deployment guides.
- Apache 2.0 license and broad industry adoption.