Troveo aggregates licensed video content from thousands of sources and packages it for AI model training. The tech stack—PyTorch, Hugging Face, Kafka, Snowflake, PostgreSQL on AWS—signals a data-platform-first architecture focused on both raw-footage delivery and downstream ML feature engineering. Active projects around petabyte-scale ingestion pipelines, embedding ML models into backend services, and dataset curation for model developers reveal a company building infrastructure that bridges content licensing and AI training workflows.
Troveo licenses video footage from thousands of content providers and prepares it for AI model training. The company maintains a library spanning over 5 million hours of footage, with advanced pipelines that handle cleaning, annotation, enrichment, and segmentation. They serve model-training teams at AI companies and research labs that need both raw provenance-verified video and annotated datasets ready for fine-tuning. Founded in 2024, the company is based in Austin and operates as a focused engineering and data organization optimized around video-at-scale workflows.
Python, Go, Node.js, PyTorch, Hugging Face, AWS, PostgreSQL, Snowflake, Kafka, and observability tools (Prometheus, Grafana, Jaeger) for distributed systems and petabyte-scale data pipelines.
Robust delivery pipelines for petabyte-scale video ingestion, ML infrastructure scaling on AWS, distributed systems for data-pipeline performance, embedding ML models into backend services, and analytical tools for their video library.
Austin, Texas. The company is currently hiring across engineering, data, sales, and marketing roles within the United States.
Other companies in the same industry, closest in size