Baseten Tech Stack

AI inference platform with global GPU infrastructure and 99.99% uptime

Software Development San Francisco, CA 51–200 employees Privately Held

Baseten operates an inference-focused AI infrastructure platform built on Kubernetes, Docker, and NVIDIA hardware (CUDA, TensorRT, TensorRT-LLM). The tech stack reveals a production-grade, distributed system—Prometheus, Loki, Grafana for observability; RDMA and InfiniBand in active adoption signal investment in high-performance interconnects for multi-GPU workloads. Active hiring is heavily weighted toward senior and mid-level engineers (78 open roles, 76 posted in the last month), paired with deliberate customer-facing function builds (sales, product, HR infrastructure), indicating transition from pure engineering product toward go-to-market maturity.

Tech Stack 79 technologies

Core StackGo Python Kubernetes Docker Helm Terraform AWS Grafana Loki Prometheus CloudFormation Pulumi GitHub Actions GitLab CI/CD CircleCI Jenkins Elasticsearch OpenTelemetry Figma React TypeScript NVIDIA GCP SIEM TensorRT-LLM CUDA TensorRT ComfyUI WebSockets Whisper+42 more

AdoptingCursor RDMA InfiniBand Claude Codex NVLink NVIDIA

What Baseten Is Building

◆Challenges

Capacity management
Scalable infrastructure
Improving engineering productivity
Ensuring soc 2 compliance
Technical friction
Reliability of ml workloads
Performance bottlenecks in ml inference
Deploying ai models at scale
Building foundation and processes for customer-facing teams
Scaling large language models

▲Active Projects

Baseten inference stack
Fastest, most accurate whisper transcription
Developing sales pipeline
Model apis for frontier models
New deployments
Building foundation and processes for customer-facing teams
Operational debugging issues
Annual bonus planning
Go-to-market strategy for fastest growing customer segment
Distributed training infrastructure for large-scale foundation models

Hiring Activity

Accelerating150 roles · 75 in 30d

Department

Engineering

Sales

Product

Finance

Marketing

Data

Ops

Seniority

Senior

Mid

Manager

Junior

Lead

Company intelligence

Find more companies like Baseten by tech stack, pain points and active projects

Get started free

About Baseten

Baseten provides inference infrastructure for AI teams deploying large language models and foundation models at scale. The platform combines a proprietary inference stack (leveraging TensorRT and NVIDIA optimization) with managed Kubernetes infrastructure across AWS and GCP, targeting global availability and high SLA commitments. Pain points tracked internally center on capacity management, ML workload reliability, and performance bottlenecks in inference—core problems their customers face. The company is 51–200 employees, based in San Francisco, and actively building sales pipeline and go-to-market strategy alongside core infrastructure development.

HeadquartersSan Francisco, CA

Company Size51–200 employees

Hiring MarketsUnited States, Canada

Frequently Asked Questions

What is Baseten's tech stack?

Core: NVIDIA (CUDA, TensorRT, TensorRT-LLM), Kubernetes, Docker, Python, Go. Infrastructure: AWS, GCP, Terraform, CloudFormation, Pulumi. Observability: Prometheus, Grafana, Loki, OpenTelemetry. CI/CD: GitHub Actions, GitLab CI/CD, CircleCI, Jenkins. Frontend: React, TypeScript, WebSockets. Adopting: RDMA, InfiniBand, NVLink.

What is Baseten working on?

Core: baseten inference stack, model APIs for frontier models, distributed training infrastructure for large-scale foundation models. Current focus: fastest/most accurate Whisper transcription, new deployments, developing sales pipeline, and go-to-market strategy for fastest-growing customer segments.