The fastest open source LLMs
Wafer's autonomous agents optimize inference across the entire stack, delivering the fastest and cheapest open models on the planet.



Wafer Pass
Flat rate, cancel anytime. Access all the fastest frontier open source LLMs. For personal usage only.
Starter
Solo devs, daily agents
- 1,000 requests per 5-hour window
- Access to every model Wafer hosts
Built for personal agentic usage.
Get StartedPrivacy
Production agents, private workloads
- 2,000 requests per 5-hour window
- Zero Data Retention
Built for personal agentic usage.
Get PrivacyBy purchasing, you agree to our Terms of Service and Privacy Policy.
Wafer Serverless
Billed per token.
Per-model rates
Cache-read tokens billed at 10% of input. No minimums, no commitment.






AI that optimizes AI
Wafer agents autonomously profile, diagnose, and optimize inference across the entire stack. This means we can run the fastest AI on the planet on any AI hardware.
Enterprise inference optimization
Get set up with the best performance for any custom model, with inference optimization tailored to your hardware, workloads, and production constraints, in less than 24 hours.