Together AI operates a purpose-built GPU cloud for model training, fine-tuning, and inference, with a tech stack centered on PyTorch, Kubernetes, and custom inference engines (vLLM, SGLang, Triton). The adopting list reveals heavy infrastructure investment—moving into RoCEv2, InfiniBand, and Ceph—while projects focus on distributed GPU scheduling, multi-petabyte storage systems, and cost optimization, indicating they're scaling toward larger, multi-region deployments and addressing the operational complexity of managing GPU fleets at cloud scale.
Together AI is an AI infrastructure company providing GPU cloud services to AI teams at enterprise SaaS companies and AI-native startups. Founded in 2022 and headquartered in San Francisco, the company operates a platform for training, fine-tuning, and serving large language models on optimized hardware clusters. The engineering-focused organization (35+ roles) is building distributed systems for GPU scheduling, storage at petabyte scale, and usage-based billing. Active hiring spans the United States, United Kingdom, Netherlands, and India, with seniority concentrated in senior and staff levels.
Core: PyTorch, Python, TypeScript, CUDA, Kubernetes. Inference: vLLM, SGLang, Triton. Infrastructure: AWS (EKS, Kinesis, Lake Formation), Terraform, Argo CD. Monitoring and workflow: Apache Airflow, Kafka. Now adopting: RoCEv2, InfiniBand, Ceph for high-speed networking and distributed storage.
Efficient inference engines, distributed GPU scheduling, multi-petabyte AI/ML storage systems (Kubernetes-native operators), usage-based billing, global data center strategy, and cost optimization for large-scale model serving.
Other companies in the same industry, closest in size