Custom silicon and compiler stack for AI inference at scale
Persimmons designs full-stack AI inference hardware and software, from custom silicon (ASIC, chiplets) through compilers (LLVM, MLIR, XLA, IREE) to communication libraries (NCCL, MPI) for datacenter and edge deployment. The engineering-heavy org (10 of 11 roles) skews senior, with active projects spanning chiplet verification, compiler optimization, and multi-node communication—a signal the company is solving internal scaling bottlenecks (communication across thousands of nodes, timing closure, ASIC verification complexity) rather than shipping a product yet.
Persimmons, founded in 2023, builds custom inference silicon and the software stack required to deploy generative AI workloads efficiently across edge devices and large-scale HPC clusters. The company operates from San Jose with a small, senior-focused engineering team. Their approach spans hardware design (ASIC and chiplet architecture), compiler infrastructure (LLVM/MLIR-based optimization), and communication protocols for distributed inference—targeting both latency optimization at the silicon level and scalability across thousands of compute nodes in datacenter environments.
LLVM, MLIR, XLA, IREE, PyTorch, TensorFlow, JAX, Halide, C++, Python, SystemVerilog, Verilog, UVM, NCCL, ROCm, MPI, ASIC design tools (Cadence, Synopsys), and PCIe. No active adopts or replacements recorded.
Chiplet design and verification, AI model compiler optimization, novel scheduling algorithms, and communication libraries for multi-node AI clusters. Internal challenges include scaling communication across thousands of nodes, timing closure in ASIC design, and runtime latency optimization for modern AI workloads.
Other companies in the same industry, closest in size