AI trainer network and annotation platform for model training at scale
Sourcebae operates a distributed workforce platform focused on AI model training and data annotation. The tech stack spans Oracle cloud infrastructure, Kafka streaming, Databricks analytics, and computer vision (YOLO, OpenCV) — reflecting both legacy enterprise integrations and emerging AI workflows. Hiring is heavily engineering-focused with 22 senior/mid engineers and 10 data roles, while active projects reveal dual priorities: enterprise system implementations (SAP, Oracle Order Management) and AI-specific work (LLM fact-checking, multi-modal fusion, Gemini personalization features).
Sourcebae provides human-in-the-loop AI training through a global network of vetted trainers and annotators. The company operates three primary service lines: AI trainers for reinforcement learning from human feedback (RLHF) and model supervision; annotation teams for data labeling and QA; and contract developers for agile AI projects. Compliance and workforce operations are handled end-to-end. Founded in 2022 and based in India with 51–200 employees, Sourcebae targets AI development teams at organizations building and refining large language models and multimodal systems.
Engineering uses Java, Spring Boot, Angular, and AWS (Lambda, SQS). Data pipelines run on Kafka, Databricks, and Snowflake. Backend includes Oracle cloud (EBS, ASCP, Forms), ServiceNow, and computer vision libraries (YOLO, OpenCV).
Projects span AI model training (LLM fact-checking, Gemini personalization, multi-modal fusion) and enterprise cloud implementations (Oracle Order Management, SAP cloud integration, legacy data migration using FBDI).
Other companies in the same industry, closest in size