AI data curation and post-training infrastructure for LLMs
Bespoke Labs builds infrastructure for preparing and curating training data for large language models, with a focus on reinforcement learning for agents. The tech stack reveals a heavy emphasis on orchestration (Kubernetes, Airflow, Spark) and ML ops tooling (MLflow, Weights & Biases, PyTorch, TensorFlow), indicating they're solving for distributed data processing and model training at scale. Active hiring is almost entirely data-focused (25 of 46 roles), paired with senior-level engineering, suggesting the company is scaling production systems for high-throughput data pipelines rather than expanding product surface area.
Bespoke Labs is a venture-backed startup building data infrastructure for LLM training pipelines. The company operates across three interconnected domains: curating and evaluating training datasets at scale, designing benchmark tasks that reflect real-world scenarios (extracted from DevOps incidents and similar sources), and operating secure, distributed compute systems to process and validate data. Their customer base appears to be AI teams and model builders who need production-grade data preparation workflows. The organization runs lean at 2–10 employees but maintains an aggressive hiring profile across multiple countries, with deepest headcount growth in data engineering roles.
Primary stack includes Kubernetes, Terraform, GitLab CI/CD, Go, Python, Java, Docker, gRPC, Prometheus, Grafana for infrastructure; PyTorch, TensorFlow, scikit-learn, NumPy, Pandas for ML; and Apache Spark, Airflow, MLflow, Weights & Biases for data orchestration and experiment tracking.
Mountain View, California, United States.
Other companies in the same industry, closest in size