Open-source metadata platform for data governance and AI asset discovery
DataHub operates a metadata management platform built on a graph architecture designed to handle discovery, governance, and observability across data and AI systems. The stack is heavy on data infrastructure (Kafka, Spark, Airflow, Snowflake, Redshift, dbt) and backend languages (Python, Java, Scala, Go), with a modern frontend (React, TypeScript, GraphQL). The engineering-dominant hiring shape (15 of 21 active roles) paired with projects focused on connectors, cloud-native ingestion, and real-time metadata processing suggests the company is scaling platform maturity and breadth rather than feature velocity.
Notable leadership hires: AI Tech Lead
DataHub provides an open-source and cloud-managed platform for metadata management, data governance, and observability. The platform connects to over 80 data sources and processes millions of metadata events monthly, targeting technical teams and enterprises deploying AI and data systems at scale. The dual open-source and SaaS model (DataHub Core and DataHub Cloud) positions the company for both community adoption and managed service revenue. Founded in 2021 and based in Palo Alto with 51–200 employees, DataHub operates across the United States, India, UAE, and Australia, reflecting distributed engineering and support operations.
DataHub uses PyTorch, TensorFlow, Python, Kafka, Spark, Airflow, Snowflake, Redshift, dbt, React, TypeScript, GraphQL, and Kubernetes. Backend languages include Java, Scala, Kotlin, Go, and Node.js.
DataHub is headquartered in Palo Alto, California. The company was founded in 2021 and is privately held with 51–200 employees.
Other companies in the same industry, closest in size