Marketplace platform connecting data holders with AI training data buyers
Protege operates a two-sided marketplace designed to reduce friction in AI training data sourcing. The stack (Python, SQL, Snowflake, Databricks, cloud infrastructure across AWS/GCP/Azure) supports data ingestion and validation workflows. Hiring is balanced across data (5), sales (5), and product (4) with senior/lead-heavy seniority mix—reflecting a sales-led go-to-market phase combined with data ops maturity. Active projects signal expansion into healthcare verticals and partnership integrations, while pain points center on the operational cost and timeline of data deals.
Notable leadership hires: Head of Product Success, Data Solutions Lead
Protege is a data marketplace platform founded in 2024, based in New York City. The company addresses a structural problem in AI: data holders (enterprises, institutions, content owners) lack clear paths to monetize their data, while AI teams spend significant time and resources negotiating access. Protege's platform automates discovery, vetting, and deal flow between these two groups, with built-in governance, IP protection, and security controls. Early traction shows healthcare as a priority vertical, with active initiatives to expand into new segments, establish partnerships, and scale the supplier network.
Python, SQL, Snowflake, Databricks, Apache Spark, and cloud infrastructure (AWS, GCP, Azure). Data handling is core; the stack reflects scale-ready analytics and ML pipeline needs.
Healthcare is the primary focus, with projects around healthcare delivery and data sourcing. Active initiatives also include new vertical launch and early partnership development to expand TAM.
Other companies in the same industry, closest in size