A guide to building long-term compounding knowledge infrastructure. See details on GitHub .

DataHub

The open-source data discovery and metadata platform for the modern data stack, offering a real-time metadata graph, lineage and observability.

Introduction

DataHub is an open-source data discovery and metadata platform that originated from LinkedIn. It provides a real-time metadata graph, lineage, documentation and observability tools to help teams manage and discover data assets across the modern data stack.

Key features

  • Real-time metadata graph connecting datasets, dashboards, pipelines and services.
  • Wide set of connectors for warehouses, messaging systems and BI tools.
  • Demo environments and a comprehensive docs site for quick evaluation.
  • Active community and enterprise adoption.

Use cases

  • Data catalog and self-serve analytics enabling analysts to find and use data.
  • Governance and compliance via lineage, auditing and access controls.
  • Central metadata layer integrating storage, compute and BI systems.

Technical characteristics

  • Multi-language implementation (Java/TypeScript/Python) with modular architecture.
  • Quickstart with Docker and Helm; production-ready deployment guides.
  • Apache 2.0 license and broad industry adoption.

Comments

DataHub
Resource Info
🌱 Open Source 💾 Data 📱 Application