MiniMax-M2.7 is now live on Wafer Pass.

This is one of the models we have been most excited to bring onto Pass. MiniMax-M2.7 is a sparse MoE model with 230B total parameters and roughly 10B active parameters per token, tuned for agentic coding, tool use, and long-horizon productivity workflows. It is exactly the kind of model that benefits from a serving stack built around sustained throughput, long context, and operational observability.

We are serving it as MiniMax-M2.7 with a 204,800 token context window. You can see current Pass plans on the Wafer pricing page, or jump straight into the Wafer Pass setup guide.

Why MiniMax-M2.7

Most model launches are still optimized around chat demos. MiniMax-M2.7 is more interesting for the workloads that show up after the demo: repository-scale coding agents, debugging loops, planning over large documents, and workflows where the model has to keep tool state and project context alive for a long time.

The model's long context window makes it useful for:

Reading a large codebase without aggressively chopping it into small fragments
Keeping issue history, traces, logs, and diffs in the same prompt
Running agent loops where intermediate decisions matter
Handling production debugging workflows that mix code, telemetry, and runbooks

Those are the same workloads Wafer Pass is built for.

How To Use It

Use the model ID MiniMax-M2.7 with the OpenAI-compatible Pass endpoint. For tool-specific setup instructions for Claude Code, Codex, Cline, Roo Code, Kilo Code, OpenHands, LibreChat, and other harnesses, see the Wafer Pass docs.

bash

curl https://pass.wafer.ai/v1/chat/completions \
  -H "Authorization: Bearer $WAFER_PASS_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "MiniMax-M2.7",
    "messages": [
      {
        "role": "user",
        "content": "Summarize the architecture of this repository."
      }
    ],
    "max_tokens": 1024
  }'

Context Length

MiniMax-M2.7 is served with a 204,800 token context window. For long-context requests, keep enough room for the expected completion inside the same 204,800 token budget.

What Comes Next

The next target is throughput: collect live telemetry, watch real request behavior, and tune concurrency, router policy, and pricing around the usage we actually see.

If you want to run long-context coding agents on MiniMax-M2.7, get Wafer Pass or review the pricing and plan details.