Wafer, Inc. (“Wafer,” “we,” “us”) operates an inference platform that serves optimized open-source large language models via API. This Privacy Policy explains what data we collect when you use our services at wafer.ai and our API endpoints, how we use it, and the choices you have.
1. Scope
This policy applies to:
- Our website (wafer.ai and subdomains).
- Our inference API, including Wafer Pass and any dedicated or enterprise endpoints.
- Any account or billing interactions you have with us directly or through a reseller (e.g., an API aggregator).
It does not cover third-party services that integrate with Wafer (such as agent harnesses, IDE extensions, or aggregators); those services have their own privacy policies.
2. Data We Collect
2.1 Account and billing data
When you sign up or subscribe, we collect your name, email, and billing information. Payments are processed by our payment provider; we do not store full credit card numbers on our servers.
2.2 API request data
When you call our API, we process:
- Prompts, completions, and any tool/function call payloads you send or receive.
- Request metadata such as model, timestamp, token counts, latency, IP address, user agent, and API key identifier.
- Error traces and diagnostic information when a request fails.
2.3 Website data
When you visit wafer.ai, we collect standard web analytics (pages viewed, referrer, approximate location derived from IP, device and browser type). We use cookies only where needed to operate the site and for basic analytics.
3. How We Use Data
We use the data above to:
- Serve your inference requests and return responses.
- Operate, monitor, and improve the reliability, performance, and security of our systems.
- Investigate abuse, fraud, and violations of our Terms of Service.
- Bill for usage and manage your account.
- Communicate with you about your account, service changes, and (if you opt in) product updates.
4. Model Training
We do not train the models we serve on your prompts or completions. Inputs and outputs from your API requests are never used to train, fine-tune, or otherwise improve the capabilities or behavior of the open-source models we serve, and we do not share your prompts or completions with any third-party model provider for training.
Speculative decoding. To make inference faster and cheaper, we train small internal “draft” models used only for speculative decoding. These draft models propose candidate tokens that the main model then verifies, so every token you receive is still produced by the open-source model you selected. Speculative decoding does not change model outputs; it only accelerates them. We may use prompts and completions from traffic that is not subject to Zero Data Retention or another contractual retention restriction to train and evaluate these draft models. Draft models are used internally for serving performance, are not exposed as a product, and are not shared with third parties. Traffic covered by Zero Data Retention in Section 6 is excluded from draft-model training.
We also use aggregated, non-content metadata (e.g., token counts, latency distributions, error rates) to improve our kernel optimization and serving infrastructure.
5. Data Retention
For Wafer Pass Starter and other standard offerings not covered by Zero Data Retention, request logs containing prompts and completions may be stored in an anonymized form, stripped of direct account identifiers beyond an internal request ID, for up to 30 days to detect, prevent, and investigate abuse, then deleted.
For Wafer Pass Privacy and grandfathered accounts with Zero Data Retention enabled, prompts and completions are not written to durable storage after request processing. We retain only the operational metadata needed to run, secure, and bill the service.
Other inference API offerings, including dedicated, enterprise, or custom endpoints, may have different retention terms based on the product, endpoint configuration, or customer agreement that applies to that service.
Operational metadata (token counts, latency, error codes, and related service diagnostics) may be retained longer for billing reconciliation, capacity planning, service security, and abuse prevention.
Account and billing records are retained for as long as your account is active and for the period required by applicable tax and accounting laws after closure.
6. Wafer Pass Privacy - Zero Data Retention (ZDR)
Wafer Pass Privacy includes Zero Data Retention. For Wafer Pass Privacy and accounts we explicitly grandfather into ZDR:
- Prompts and completions are not written to durable storage after request processing.
- Prompts and completions are not used for the speculative- decoding draft-model training described in Section 4.
- Only minimal metadata required to operate the service, bill for usage, and prevent abuse, such as token counts, timestamps, request IDs, model, and latency, may be retained longer.
This ZDR commitment applies only where Wafer Pass Privacy, grandfathered ZDR access, or a customer agreement explicitly says it applies. Other Wafer inference API products may have different retention terms by product, endpoint, or contract.
7. Sharing and Disclosure
We share data only as needed to run the service:
- Infrastructure providers. We run inference on cloud and hardware partners under contractual confidentiality and data protection terms. These providers process data only on our instructions.
- Payment and business tools. We share limited data with our payment processor, billing platform, and business communications tools.
- Legal requests. We may disclose data to comply with valid legal process, to protect our rights or property, or to protect the safety of users or the public.
- Business transfers. If Wafer is involved in a merger, acquisition, or asset sale, your data may be transferred as part of that transaction, subject to this policy.
We do not sell personal data, and we do not share prompts or completions with advertisers or data brokers.
8. Security
We use industry-standard safeguards to protect your data, including TLS encryption for data in transit, encryption at rest for stored logs, access controls on internal systems, and audit logging. No system is perfectly secure; we cannot guarantee absolute security, but we work to meet or exceed industry norms for inference providers.
9. Your Rights and Choices
Depending on where you live, you may have rights to access, correct, delete, or port your personal data, or to object to or restrict certain processing. To exercise these rights, email us at the address below. We will verify your identity before acting on a request.
You can also:
- Delete your API keys at any time from your account.
- Request deletion of your account and associated logs, subject to legal retention requirements.
- Opt out of marketing emails using the unsubscribe link in any such email.
10. International Users
Inference is served from data centers in the United States. If you access Wafer from outside the U.S., your data will be transferred to and processed in the U.S., which may have different data protection laws than your jurisdiction. By using the service, you consent to this transfer.
11. Children's Privacy
Wafer is not directed to children under 13 (or the equivalent minimum age in your jurisdiction), and we do not knowingly collect personal data from them. If you believe a child has provided us personal data, contact us and we will delete it.
12. Changes to This Policy
We may update this policy from time to time. We will post the new version at wafer.ai/privacy and update the “Last updated” date. For material changes, we will provide additional notice (e.g., by email or an in-product banner) before the changes take effect.
13. Account Use and Acceptable Use
Wafer Pass is licensed for personal use by a single individual. To keep the service fast and fair for everyone, we enforce the following limits on every Wafer Pass subscription:
- One account per person. Each individual may hold only one active Wafer Pass subscription. Sharing your account, your API key, or your subscription with anyone else, or operating multiple Wafer Pass accounts as the same person, is not permitted.
- Concurrency limit. Wafer Pass enforces a per-account concurrency limit of 3 in-flight inference requests at any given time. Requests beyond this limit may be queued or rejected until earlier requests complete.
- No reselling or pooling. You may not resell access to your Wafer Pass account, proxy it for third parties, or pool a single subscription across a team or organization. Teams and organizations should contact us at emilio@wafer.ai about a dedicated or enterprise plan.
The contractual terms that govern your Wafer Pass subscription including billing, refunds, cancellation, enforcement of the limits above, and our right to suspend or terminate accounts that violate them, are described in our Terms of Service.
14. Contact Us
Questions, requests, or complaints about this policy can be sent to: