NEWAnnouncing our $4M seedRead

MiniMax-M2.7 is Live on Wafer Pass

Wafer Pass now serves MiniMax-M2.7 live with a 204,800 token context window, built for long-context coding agents and production engineering workflows.

May 6, 2026·Steven Arellano
MiniMax-M2.7 banner for Wafer Pass

MiniMax-M2.7 is now live on Wafer Pass.

This is one of the models we have been most excited to bring onto Pass. MiniMax-M2.7 is a sparse MoE model with 230B total parameters and roughly 10B active parameters per token, tuned for agentic coding, tool use, and long-horizon productivity workflows. It is exactly the kind of model that benefits from a serving stack built around sustained throughput, long context, and operational observability.

We are serving it as MiniMax-M2.7 with a 204,800 token context window. You can see current Pass plans on the Wafer pricing page, or jump straight into the Wafer Pass setup guide.

Why MiniMax-M2.7

Most model launches are still optimized around chat demos. MiniMax-M2.7 is more interesting for the workloads that show up after the demo: repository-scale coding agents, debugging loops, planning over large documents, and workflows where the model has to keep tool state and project context alive for a long time.

The model's long context window makes it useful for:

  • Reading a large codebase without aggressively chopping it into small fragments
  • Keeping issue history, traces, logs, and diffs in the same prompt
  • Running agent loops where intermediate decisions matter
  • Handling production debugging workflows that mix code, telemetry, and runbooks

Those are the same workloads Wafer Pass is built for.

How To Use It

Use the model ID MiniMax-M2.7 with the OpenAI-compatible Pass endpoint. For tool-specific setup instructions for Claude Code, Codex, Cline, Roo Code, Kilo Code, OpenHands, LibreChat, and other harnesses, see the Wafer Pass docs.

bash
curl https://pass.wafer.ai/v1/chat/completions \
  -H "Authorization: Bearer $WAFER_PASS_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "MiniMax-M2.7",
    "messages": [
      {
        "role": "user",
        "content": "Summarize the architecture of this repository."
      }
    ],
    "max_tokens": 1024
  }'

Context Length

MiniMax-M2.7 is served with a 204,800 token context window. For long-context requests, keep enough room for the expected completion inside the same 204,800 token budget.

What Comes Next

The next target is throughput: collect live telemetry, watch real request behavior, and tune concurrency, router policy, and pricing around the usage we actually see.

If you want to run long-context coding agents on MiniMax-M2.7, get Wafer Pass or review the pricing and plan details.