FriendliAI Tech Stack

GPU-optimized inference platform for open-weight LLMs at production scale

Software Development San Francisco, California 11–50 employees Founded 2021 Privately Held

FriendliAI operates a specialized inference engine built on continuous batching—a foundational technique the team invented. The stack (Python, Rust, C++, CUDA, ROCm, Triton, Kubernetes) reflects deep GPU systems work; hiring leans heavily senior/staff engineers with minimal marketing presence, indicating a developer-first, research-driven positioning. Active projects span custom GPU kernels, multi-modal pipelines, and an agent execution platform—suggesting the company is moving beyond basic LLM serving into more complex inference workloads.

Tech Stack 20 technologies

Core StackPython Go Rust C++ Hugging Face TypeScript React Next.js FastAPI PostgreSQL gRPC GraphQL OpenTelemetry Kubernetes NVIDIA GPU CUDA ROCm Triton REST SQL

What FriendliAI Is Building

◆Challenges

Scaling team to meet market demand
Cost-aware architectural decisions
Balancing performance reliability cost
Performance bottlenecks
Latency-critical inference
Technical debt
System reliability
Scaling web platform
Optimizing gpu kernels for low-latency inference
Cross-vendor performance parity between nvidia and amd hardware

▲Active Projects

Gpu-accelerated ai inference platform
Multi-modal model pipelines
Ai inference platform core systems
Agent execution platform
Proprietary inference engine for 450k models
Custom gpu kernel development for transformer and diffusion workloads
Kernel compiler and runtime development
Performance profiling and benchmarking infrastructure
Web platform for deploying multimodal models
Observing workloads

Hiring Activity

Minimal10 roles · 0 in 30d

Department

Engineering

Marketing

Seniority

Senior

Mid

Staff

Company intelligence

Find more companies like FriendliAI by tech stack, pain points and active projects

Get started free

About FriendliAI

FriendliAI builds an inference platform optimized for running open-weight and custom AI models on GPU infrastructure. Founded in 2021 and based in San Francisco, the company targets AI engineers and ML teams seeking production-grade model deployment with lower latency and cost than closed-model APIs. The platform includes a proprietary inference engine, custom GPU kernel development, and a web interface for multi-modal model deployment. The company operates as a small, engineering-heavy organization with active hiring in the United States and South Korea.

HeadquartersSan Francisco, California

Company Size11–50 employees

Founded2021

Hiring MarketsUnited States, South Korea

Frequently Asked Questions

What is FriendliAI's tech stack?

Core languages: Python, Rust, C++, Go. GPU/compute: NVIDIA CUDA, ROCm, Triton. Serving: FastAPI, gRPC, GraphQL. Data/ops: PostgreSQL, Kubernetes, OpenTelemetry. ML tooling: Hugging Face. Frontend: React, Next.js, TypeScript.

What is FriendliAI working on?

Multi-modal inference pipelines, custom GPU kernel optimization, an agent execution platform, a proprietary engine supporting 450k models, and a web platform for model deployment. Focus areas include low-latency inference, performance profiling, and cross-vendor GPU support (NVIDIA and AMD).