echoloc

FriendliAI Tech Stack

GPU-optimized inference platform for open-weight LLMs at production scale

Software Development San Francisco, California 11–50 employees Founded 2021 Privately Held

FriendliAI operates a specialized inference engine built on continuous batching—a foundational technique the team invented. The stack (Python, Rust, C++, CUDA, ROCm, Triton, Kubernetes) reflects deep GPU systems work; hiring leans heavily senior/staff engineers with minimal marketing presence, indicating a developer-first, research-driven positioning. Active projects span custom GPU kernels, multi-modal pipelines, and an agent execution platform—suggesting the company is moving beyond basic LLM serving into more complex inference workloads.

Tech Stack 20 technologies

What FriendliAI Is Building

Challenges

  • Scaling team to meet market demand
  • Cost-aware architectural decisions
  • Balancing performance reliability cost
  • Performance bottlenecks
  • Latency-critical inference
  • Technical debt
  • System reliability
  • Scaling web platform
  • Optimizing gpu kernels for low-latency inference
  • Cross-vendor performance parity between nvidia and amd hardware

Active Projects

  • Gpu-accelerated ai inference platform
  • Multi-modal model pipelines
  • Ai inference platform core systems
  • Agent execution platform
  • Proprietary inference engine for 450k models
  • Custom gpu kernel development for transformer and diffusion workloads
  • Kernel compiler and runtime development
  • Performance profiling and benchmarking infrastructure
  • Web platform for deploying multimodal models
  • Observing workloads

Hiring Activity

Minimal10 roles · 0 in 30d

Department

Engineering
6
Marketing
1

Seniority

Senior
4
Mid
2
Staff
1
Company intelligence

Find more companies like FriendliAI by tech stack, pain points and active projects

Get started free

About FriendliAI

FriendliAI builds an inference platform optimized for running open-weight and custom AI models on GPU infrastructure. Founded in 2021 and based in San Francisco, the company targets AI engineers and ML teams seeking production-grade model deployment with lower latency and cost than closed-model APIs. The platform includes a proprietary inference engine, custom GPU kernel development, and a web interface for multi-modal model deployment. The company operates as a small, engineering-heavy organization with active hiring in the United States and South Korea.

HeadquartersSan Francisco, California
Company Size11–50 employees
Founded2021
Hiring MarketsUnited States, South Korea

Frequently Asked Questions

What is FriendliAI's tech stack?

Core languages: Python, Rust, C++, Go. GPU/compute: NVIDIA CUDA, ROCm, Triton. Serving: FastAPI, gRPC, GraphQL. Data/ops: PostgreSQL, Kubernetes, OpenTelemetry. ML tooling: Hugging Face. Frontend: React, Next.js, TypeScript.

What is FriendliAI working on?

Multi-modal inference pipelines, custom GPU kernel optimization, an agent execution platform, a proprietary engine supporting 450k models, and a web platform for model deployment. Focus areas include low-latency inference, performance profiling, and cross-vendor GPU support (NVIDIA and AMD).

Similar Companies in Software Development

Other companies in the same industry, closest in size