Cast AI automates cost and performance management for Kubernetes and AI workloads across AWS, GCP, and Azure. The tech stack reveals a company built around inference optimization (vLLM, SGLang, TensorRT, PyTorch, ClickHouse) layered atop container orchestration and observability (Kubernetes, Prometheus, Grafana, Tempo). Active projects on quantization schemes, inference configuration automation, and GPU over-provisioning address a consistent pain: manual tuning and cost inefficiencies in LLM serving — suggesting the platform is moving beyond generic Kubernetes cost-cutting into AI-specific resource management.
Cast AI builds an automation platform for Kubernetes and AI workload optimization across multi-cloud environments. The company targets engineering teams running containerized applications and generative AI inference on AWS, GCP, and Azure, addressing both performance reliability and operational cost. Founded in 2019 and based in Miami, the company operates with 201–500 employees and is currently accelerating hiring, particularly in engineering and product roles across Europe, North America, and Asia. The product surface spans container resource optimization, inference performance tuning, and observational visibility through open telemetry standards.
AWS, Google Cloud Platform (GCP), and Microsoft Azure. The platform is designed for multi-cloud Kubernetes deployments and uses cloud-native services like AWS RDS, Cloud SQL, and Azure SQL Database.
Go and Python for core services; Kubernetes, GitLab CI/CD, and ArgoCD for orchestration; Prometheus, Grafana, Loki, and Tempo for observability; and vLLM, SGLang, PyTorch, and TensorRT for AI inference optimization.
Cast AI's technology stack, projects, and hiring signals are inferred from public hiring and company data — career pages, public listings, and company web presence — then clustered and de-duplicated. Figures are estimates that refresh over time. Read our full methodology →
This is not an official vendor or customer list. It is a technology-adoption signal inferred from public data, intended for B2B research.