AI inference platform for cost-efficient LLM deployment at scale
ElastixAI is building a systems-level inference platform from hardware primitives up: the stack spans RTL design, LLVM/XLA compilers, and deployment frameworks (vLLM, SGLang, TensorRT-LLM, DeepSpeed). The hiring pattern—nearly all senior engineers, no sales or product yet—and project focus on kernel decomposition and hardware roadmaps suggest the company is still in research-to-production phase, attacking inference efficiency and throughput as the core constraint.
ElastixAI, founded in 2025, is a Seattle-based startup building next-generation inference infrastructure for generative AI workloads. The platform targets cost and efficiency as primary levers: the team is working on LLM operation decomposition, performance-power-area trade-off analysis, and custom hardware roadmaps to reduce inference latency and throughput bottlenecks. Current product work centers on core engine architecture and RTL verification, positioning the company to serve infrastructure teams at hyperscalers and enterprises running large language models at scale.
Core stack spans hardware design (Verilog, SystemVerilog), compilers (MLIR, LLVM, XLA), ML frameworks (PyTorch, TensorFlow, JAX), and inference engines (vLLM, SGLang, TensorRT-LLM, DeepSpeed). Deployment on AWS, GCP, Azure via Kubernetes and Docker.
Core projects include AI inference engine development, RTL design and verification, LLM operation decomposition into kernel primitives, and hardware roadmap design. Primary pain points are inference efficiency, throughput, and latency optimization.
Other companies in the same industry, closest in size