The fastest open source LLMs

Serverless and dedicated inference for the world's fastest open-source LLMs. Run GLM, Kimi, Qwen, and more through a single API.

Get Started For enterprise

Loved by

+ more

“I'm very impressed. This is 10x better than redacted competitor!”

Garry Tan

President, Y Combinator

Featured

Wafer Serverless

Pay-as-you-go access to the fastest open models. Billed per token, no minimums, no commitment.

GLM-5.1

Qwen3.5-397B-A17B

Qwen3.6-35B-A3B

Kimi-K2.6Others +

Get Started

Per-model rates

GLM-5.1

$1.50 / 1M in$4.50 / 1M out

Qwen3.5-397B-A17B

$0.60 / 1M in$3.60 / 1M out

Qwen3.6-35B-A3B

$0.19 / 1M in$1.25 / 1M out

Kimi-K2.6

$1.10 / 1M in$4.80 / 1M out

and more

Backed by

Fifty Years

Y Combinator

Liquid 2

Jeff DeanChief Scientist at Google

Woj ZarembaCo-Founder at OpenAI

Dan FuHead of Kernels at Together

Charlie SonghurstMeta Board of Directors

Arash FerdowsiCo-Founder at Dropbox

Kawal GandhiOffice of the CTO at Google

NVIDIA Inception

Fifty Years

Y Combinator

Liquid 2

Jeff DeanChief Scientist at Google

Woj ZarembaCo-Founder at OpenAI

Dan FuHead of Kernels at Together

Charlie SonghurstMeta Board of Directors

Arash FerdowsiCo-Founder at Dropbox

Kawal GandhiOffice of the CTO at Google

NVIDIA Inception

Fifty Years

Y Combinator

Liquid 2

Jeff DeanChief Scientist at Google

Woj ZarembaCo-Founder at OpenAI

Dan FuHead of Kernels at Together

Charlie SonghurstMeta Board of Directors

Arash FerdowsiCo-Founder at Dropbox

Kawal GandhiOffice of the CTO at Google

NVIDIA Inception

Performance

AI that optimizes AI

Wafer agents autonomously profile, diagnose, and optimize inference across the entire stack. This means we can run the fastest AI on the planet on any AI hardware.

2.8xfaster than base SGLang

Example

Output throughput · Qwen3.5-397B · Input/Output: 1600 / 7000

4003002001000

408.4

144.8

WaferBase SGLang

tok/s (higher is better)

Enterprise inference optimization

Get set up with the best performance for any custom model, with inference optimization tailored to your hardware, workloads, and production constraints, in less than 24 hours.