AI chip design and software stack for data center inference
Enflame designs hardware and software for AI inference at scale, built on a deep compiler and systems stack (LLVM, CUDA, TensorFlow, PyTorch, NCCL, DeepSpeed). The project list—chip interconnect, quantization tools, operator development, and chiplet architecture—reveals a vertically integrated play from silicon through ML framework adaptation. Heavy senior engineering hiring (20 of 28 roles) against active work on bandwidth efficiency and power consumption points to a company solving hard performance and cost problems in domestic AI infrastructure.
Enflame, founded in 2018 and based in Shanghai, builds AI chips and software systems for data center workloads. The company operates across the full stack: hardware design (chiplet and interconnect architecture), compiler tooling (day-0 adaptation for custom AI processors), and runtime optimization (quantization, sparsification, operator kernels). Active development spans distributed communication, inference optimization, and tool chains for their custom GPU-class processors. The organization is engineering-concentrated, with R&D leadership drawn from semiconductor and systems backgrounds.
Core stack includes LLVM, GCC, CUDA, RISC-V, PyTorch, TensorFlow, Caffe, MXNet, PaddlePaddle, TensorRT, NCCL, DeepSpeed, MPI, and Open MPI for compiler, ML frameworks, and distributed training/inference.
Active projects include AI chip interconnect and chiplet architecture design, low-bit quantization and sparsification tools, custom AI operator development, and day-0 software adaptation toolchains for their processors.
Other companies in the same industry, closest in size