echoloc

Cast AI Tech Stack

Kubernetes and inference optimization for multi-cloud AI workloads

Software Development Miami, FL 201–500 employees Founded 2019 Privately Held

Cast AI automates cost and performance management for Kubernetes and AI workloads across AWS, GCP, and Azure. The tech stack reveals a company built around inference optimization (vLLM, SGLang, TensorRT, PyTorch, ClickHouse) layered atop container orchestration and observability (Kubernetes, Prometheus, Grafana, Tempo). Active projects on quantization schemes, inference configuration automation, and GPU over-provisioning address a consistent pain: manual tuning and cost inefficiencies in LLM serving — suggesting the platform is moving beyond generic Kubernetes cost-cutting into AI-specific resource management.

Tech Stack 65 technologies

Core StackGo Python Kubernetes AWS PostgreSQL gRPC Prometheus Grafana Loki GitLab CI/CD ArgoCD MySQL C++ ClickHouse Helm AWS RDS PyTorch Auth0 Okta GCP Azure Google Cloud Pub/Sub Tempo Envoy Cloud SQL Azure SQL Database vLLM SGLang TensorRT controller-runtime+34 more

What Cast AI Is Building

Challenges

  • Manual tuning
  • Cost inefficiencies in llm inference
  • Inefficient manual decision-making
  • Over-provisioning gpus
  • Manual decision making
  • Database configuration optimization
  • Query performance issues
  • Performance degradation detection
  • Inconsistent inference performance
  • Manual decision-making in kubernetes

Active Projects

  • Kimchi inference optimization system
  • Greenfield database optimization platform
  • Continuous automation of inference configuration
  • Quantization scheme shipping
  • Product scaling initiative
  • User research initiative
  • Backend system design initiative
  • Customer advocacy program
  • Community touchpoints and campaigns
  • Customer reference database

Hiring Activity

Accelerating40 roles · 40 in 30d

Department

Engineering
19
Product
12
Sales
7
Marketing
2
Support
2

Seniority

Senior
31
Mid
4
Director
3
Junior
2
Manager
2
Company intelligence

Find more companies like Cast AI by tech stack, pain points and active projects

Get started free

About Cast AI

Cast AI builds an automation platform for Kubernetes and AI workload optimization across multi-cloud environments. The company targets engineering teams running containerized applications and generative AI inference on AWS, GCP, and Azure, addressing both performance reliability and operational cost. Founded in 2019 and based in Miami, the company operates with 201–500 employees and is currently accelerating hiring, particularly in engineering and product roles across Europe, North America, and Asia. The product surface spans container resource optimization, inference performance tuning, and observational visibility through open telemetry standards.

HeadquartersMiami, FL
Company Size201–500 employees
Founded2019
Hiring MarketsBulgaria, Poland, Austria, Germany, France, Netherlands, United States, United Kingdom

Frequently Asked Questions

What cloud platforms does Cast AI support?

AWS, Google Cloud Platform (GCP), and Microsoft Azure. The platform is designed for multi-cloud Kubernetes deployments and uses cloud-native services like AWS RDS, Cloud SQL, and Azure SQL Database.

What is Cast AI's tech stack built on?

Go and Python for core services; Kubernetes, GitLab CI/CD, and ArgoCD for orchestration; Prometheus, Grafana, Loki, and Tempo for observability; and vLLM, SGLang, PyTorch, and TensorRT for AI inference optimization.

How this profile is built

Cast AI's technology stack, projects, and hiring signals are inferred from public hiring and company data — career pages, public listings, and company web presence — then clustered and de-duplicated. Figures are estimates that refresh over time. Read our full methodology →

This is not an official vendor or customer list. It is a technology-adoption signal inferred from public data, intended for B2B research.