MiniMax-M2.7 is Live on Wafer Pass
Wafer Pass now serves MiniMax-M2.7 live with a 204,800 token context window, built for long-context coding agents and production engineering workflows.

MiniMax-M2.7 is now live on Wafer Pass.
This is one of the models we have been most excited to bring onto Pass. MiniMax-M2.7 is a sparse MoE model with 230B total parameters and roughly 10B active parameters per token, tuned for agentic coding, tool use, and long-horizon productivity workflows. It is exactly the kind of model that benefits from a serving stack built around sustained throughput, long context, and operational observability.
We are serving it as MiniMax-M2.7 with a 204,800 token context window. You can see current Pass plans on the Wafer pricing page, or jump straight into the Wafer Pass setup guide.
Why MiniMax-M2.7
Most model launches are still optimized around chat demos. MiniMax-M2.7 is more interesting for the workloads that show up after the demo: repository-scale coding agents, debugging loops, planning over large documents, and workflows where the model has to keep tool state and project context alive for a long time.
The model's long context window makes it useful for:
- Reading a large codebase without aggressively chopping it into small fragments
- Keeping issue history, traces, logs, and diffs in the same prompt
- Running agent loops where intermediate decisions matter
- Handling production debugging workflows that mix code, telemetry, and runbooks
Those are the same workloads Wafer Pass is built for.
How To Use It
Use the model ID MiniMax-M2.7 with the OpenAI-compatible Pass endpoint. For tool-specific setup instructions for Claude Code, Codex, Cline, Roo Code, Kilo Code, OpenHands, LibreChat, and other harnesses, see the Wafer Pass docs.
curl https://pass.wafer.ai/v1/chat/completions \
-H "Authorization: Bearer $WAFER_PASS_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "MiniMax-M2.7",
"messages": [
{
"role": "user",
"content": "Summarize the architecture of this repository."
}
],
"max_tokens": 1024
}'Context Length
MiniMax-M2.7 is served with a 204,800 token context window. For long-context requests, keep enough room for the expected completion inside the same 204,800 token budget.
What Comes Next
The next target is throughput: collect live telemetry, watch real request behavior, and tune concurrency, router policy, and pricing around the usage we actually see.
If you want to run long-context coding agents on MiniMax-M2.7, get Wafer Pass or review the pricing and plan details.