NEWAnnouncing our $4M seedRead
+
+
+
+

The fastest open source LLMs

Serverless and dedicated inference for the world's fastest open-source LLMs. Run GLM, Kimi, Qwen, and more through a single API.

Loved by
AMD
AWS
DigitalOcean
Parasail
PipeShift
Ollama
+ more
“I'm very impressed. This is 10x better than redacted competitor!”
Garry Tan
Garry Tan
President, Y Combinator
Featured

Wafer Serverless

Pay-as-you-go access to the fastest open models. Billed per token, no minimums, no commitment.

Z.AIGLM-5.1QwenQwen3.5-397B-A17BQwenQwen3.6-35B-A3BKimiKimi-K2.6Others +

Per-model rates

Z.AIGLM-5.1
$1.50 / 1M in$4.50 / 1M out
QwenQwen3.5-397B-A17B
$0.60 / 1M in$3.60 / 1M out
QwenQwen3.6-35B-A3B
$0.19 / 1M in$1.25 / 1M out
KimiKimi-K2.6
$1.10 / 1M in$4.80 / 1M out
and more
Backed by
Fifty Years
Fifty Years
Y Combinator
Y Combinator
Liquid 2
Liquid 2
Jeff Dean
Jeff DeanChief Scientist at Google
Woj Zaremba
Woj ZarembaCo-Founder at OpenAI
Dan Fu
Dan FuHead of Kernels at Together
Charlie Songhurst
Charlie SonghurstMeta Board of Directors
Arash Ferdowsi
Arash FerdowsiCo-Founder at Dropbox
Kawal Gandhi
Kawal GandhiOffice of the CTO at Google
NVIDIA Inception
NVIDIA Inception
Fifty Years
Fifty Years
Y Combinator
Y Combinator
Liquid 2
Liquid 2
Jeff Dean
Jeff DeanChief Scientist at Google
Woj Zaremba
Woj ZarembaCo-Founder at OpenAI
Dan Fu
Dan FuHead of Kernels at Together
Charlie Songhurst
Charlie SonghurstMeta Board of Directors
Arash Ferdowsi
Arash FerdowsiCo-Founder at Dropbox
Kawal Gandhi
Kawal GandhiOffice of the CTO at Google
NVIDIA Inception
NVIDIA Inception
Fifty Years
Fifty Years
Y Combinator
Y Combinator
Liquid 2
Liquid 2
Jeff Dean
Jeff DeanChief Scientist at Google
Woj Zaremba
Woj ZarembaCo-Founder at OpenAI
Dan Fu
Dan FuHead of Kernels at Together
Charlie Songhurst
Charlie SonghurstMeta Board of Directors
Arash Ferdowsi
Arash FerdowsiCo-Founder at Dropbox
Kawal Gandhi
Kawal GandhiOffice of the CTO at Google
NVIDIA Inception
NVIDIA Inception
Performance

AI that optimizes AI

Wafer agents autonomously profile, diagnose, and optimize inference across the entire stack. This means we can run the fastest AI on the planet on any AI hardware.

2.8xfaster than base SGLang
Example
Output throughput · Qwen3.5-397B · Input/Output: 1600 / 7000
4003002001000
408.4
144.8
WaferBase SGLang
tok/s (higher is better)

Enterprise inference optimization

Get set up with the best performance for any custom model, with inference optimization tailored to your hardware, workloads, and production constraints, in less than 24 hours.