echoloc

Inferact Tech Stack

LLM inference engine optimization and hardware integration

Software Development San Francisco, CA 11–50 employees Founded 2025 Privately Held

Inferact is a systems-level infrastructure company founded by core maintainers of vLLM, focusing on kernel optimization, accelerator integration, and operational deployment for LLM inference. The engineering-heavy, senior-focused team (4 senior + 1 staff engineer across 5 total roles) reflects a deep IC culture built around low-level performance work—CUDA, Triton, MLIR, and FlashAttention dominate the stack—rather than sales velocity. Active projects span kernel engineering, hardware enablement, and cluster management, directly addressing the pain points driving the company: accelerator utilization, inference latency, and operational scale.

Tech Stack 30 technologies

Core StackC++ Python Rust Go PyTorch Kubernetes Terraform Helm AWS CUDA Triton NVIDIA Nsight FlashAttention NVIDIA AMD TPU Intel LLVM MLIR XLA vLLM TensorRT-LLM SGLang GCP Azure Ray Slurm Unsloth NVLink InfiniBand

What Inferact Is Building

Challenges

  • Maximizing accelerator performance
  • Integrating new hardware
  • Optimizing inference speed
  • Making inference cheaper and faster
  • Optimizing model execution across hardware
  • Scaling inference globally
  • Operational reliability for ml systems
  • Deployment automation for ai models
  • Operational complexity at massive scale
  • Supporting larger models

Active Projects

  • Kernel optimization for vllm inference engine
  • Accelerator integration for new hardware
  • Performance tuning for inference engines
  • Inference runtime
  • Kernel engineering
  • Cloud orchestration
  • Operational backbone for vllm
  • Cluster management system
  • Deployment automation for ai models
  • Diffusion model serving

Hiring Activity

Minimal5 roles · 0 in 30d

Department

Engineering
5

Seniority

Senior
4
Staff
1
Company intelligence

Find more companies like Inferact by tech stack, pain points and active projects

Get started free

About Inferact

Inferact builds infrastructure to optimize LLM inference across diverse hardware and deployment contexts. The company was founded by the creators and core maintainers of vLLM, an open-source inference engine. The product and organizational focus is on kernel-level performance, integrating new accelerators (NVIDIA, AMD, Intel, TPU), and automating deployment at scale. The tech stack—CUDA, Triton, C++, PyTorch, TensorRT-LLM, Kubernetes, and cluster orchestration tools (Slurm, Ray)—reflects a systems engineering organization solving problems around model execution speed, cost, and reliability. Inferact operates from San Francisco with a small, senior-heavy team.

HeadquartersSan Francisco, CA
Company Size11–50 employees
Founded2025
Hiring MarketsUnited States

Frequently Asked Questions

What is Inferact's tech stack?

Core: CUDA, Triton, C++, Python, PyTorch, vLLM, LLVM, MLIR. Infrastructure: Kubernetes, Terraform, Helm, Slurm, Ray. Hardware support: NVIDIA, AMD, Intel, TPU via XLA and TensorRT-LLM.

What is Inferact working on?

Kernel optimization for vLLM, accelerator integration, inference runtime performance, cluster management, deployment automation, and diffusion model serving. Primary focus: maximizing hardware utilization and reducing inference cost and latency at scale.

Similar Companies in Software Development

Other companies in the same industry, closest in size