GPU infrastructure and ML deployment platform for AI workloads
GMI Cloud operates a GPU-centric AI infrastructure platform built on NVIDIA, Kubernetes, and orchestration tools like Slurm and Ray. The stack reveals a focus on large-scale training and inference workload management—reinforced by active projects around GPU/CPU provisioning automation and an AI inference engine. Hiring velocity is accelerating with engineering (19 roles) leading, alongside a notable gap in US talent sourcing and concurrent emphasis on supply chain optimization, signaling scaling pains in both product delivery and vendor operations.
Notable leadership hires: Sourcing Director, Account Director
GMI Cloud provides GPU infrastructure and ML/LLM deployment services for businesses running large-scale AI workloads. Founded in 2023 and based in Mountain View, the company operates a 51–200-person organization across engineering, sales, operations, and product functions. The infrastructure layer spans NVIDIA GPUs, Kubernetes orchestration, and open-source ML frameworks (Ray, vLLM, SGLang), while the platform surface handles model integration, virtualization, and deployment. Current operational priorities include cluster stability, GPU utilization optimization, vendor lifecycle management, and converting proofs-of-concept into commercial contracts.
NVIDIA GPUs paired with Kubernetes, Slurm workload management, Ray for distributed ML, and storage layers including Ceph and NFS. The stack also integrates Azure, Google Cloud, and OCI for multi-cloud flexibility.
Automation for GPU/CPU provisioning, AI inference engine development, cluster stability and performance debugging, cost optimization reviews, and ERP integration for vendor workflows. ATS implementation and transitioning POCs to commercial agreements are in progress.
Other companies in the same industry, closest in size