Byte-native AI models and serverless LLM serving infrastructure
Sciforium builds foundation models and LLM serving infrastructure, with a deep systems focus: the stack spans PyTorch, JAX, CUDA, ROCm, vLLM, and distributed training frameworks (Ray, Kubernetes). The research-heavy org (5 engineers, 1 researcher, 1 data scientist) is tackling inference latency, distributed training optimization, and GPU kernel development — pain points that reflect a build-from-metal approach rather than API-wrapping. Backing from AMD signals hardware partnership potential.
Sciforium is a San Francisco-based AI infrastructure company founded in 2024, developing byte-native multimodal foundation models and serverless LLM serving platforms. The company operates as a small, research-forward team focused on reducing the cost and complexity of large language model deployment. Core projects span large language model research, generative media, scalable training systems, and distributed inference optimization. They serve teams building or deploying frontier AI models at scale.
PyTorch, JAX, CUDA, ROCm, vLLM, Ray, Kubernetes, Flax, XLA, and C++. Infrastructure includes TPU, AMD, and NVIDIA Nsight. Distributed storage via Lustre, GPFS, NFS.
Large language model research, generative media, distributed training system optimization, GPU kernel development, model serving platforms, and distributed inference systems. Core focus: scaling infrastructure and optimizing inference latency.
Other companies in the same industry, closest in size