+
+
+
+

The fastest open models

AI that optimizes AI. 1.5–5x faster inference on any hardware.

Wafer PassBeta
QwenQwen3.5-Turbo
Use withClaude CodeOpenClawClineRoo CodeKilo CodeOpenHands
Backed by
Fifty Years
Fifty Years
Y Combinator
Y Combinator
Liquid 2
Liquid 2
Jeff Dean
Jeff DeanChief Scientist at Google
Woj Zaremba
Woj ZarembaCo-Founder at OpenAI
Dan Fu
Dan FuHead of Kernels at Together
Charlie Songhurst
Charlie SonghurstMeta Board of Directors
Arash Ferdowsi
Arash FerdowsiCo-Founder at Dropbox
Kawal Gandhi
Kawal GandhiOffice of the CTO at Google
NVIDIA Inception
NVIDIA Inception
Fifty Years
Fifty Years
Y Combinator
Y Combinator
Liquid 2
Liquid 2
Jeff Dean
Jeff DeanChief Scientist at Google
Woj Zaremba
Woj ZarembaCo-Founder at OpenAI
Dan Fu
Dan FuHead of Kernels at Together
Charlie Songhurst
Charlie SonghurstMeta Board of Directors
Arash Ferdowsi
Arash FerdowsiCo-Founder at Dropbox
Kawal Gandhi
Kawal GandhiOffice of the CTO at Google
NVIDIA Inception
NVIDIA Inception
Fifty Years
Fifty Years
Y Combinator
Y Combinator
Liquid 2
Liquid 2
Jeff Dean
Jeff DeanChief Scientist at Google
Woj Zaremba
Woj ZarembaCo-Founder at OpenAI
Dan Fu
Dan FuHead of Kernels at Together
Charlie Songhurst
Charlie SonghurstMeta Board of Directors
Arash Ferdowsi
Arash FerdowsiCo-Founder at Dropbox
Kawal Gandhi
Kawal GandhiOffice of the CTO at Google
NVIDIA Inception
NVIDIA Inception

AI that optimizes AI

Wafer agents autonomously profile, diagnose, and optimize inference across the entire stack. This means we can run the fastest AI on the planet on any AI hardware.

2.8xfaster than base SGLang
Output throughput · Qwen3.5-397B · Input/Output: 1600 / 7000
4003002001000
408.4
144.8
WaferBase SGLang
tok/s (higher is better)

Optimize any AI model, for any AI hardware.

A single AI agent that optimizes across every hardware, to get the fastest inference for the cheapest price, always.

ASICs

Chip Companies

You built world-class hardware. We build the software that unlocks it.

Custom Agents that optimize kernels, enable new model architectures, and scale your developer ecosystem.

Cloud Providers

When a new model drops, be first on the leaderboard.

Custom Agents that optimize every model on your hardware, so your inference is the fastest and cheapest possible.

AI Labs

Your models, running as fast and cheap as possible, everywhere.

End-to-end inference optimization across every deployment target.

Maximize intelligence per watt.

AI systems today run orders of magnitude below what’s physically possible. The only way to close that gap at scale is AI that optimizes AI infrastructure.

Read the manifesto