wafer-cli: GPU Superpowers for Your Coding Agent

Give your AI coding assistant direct access to GPU documentation, trace analysis, and remote kernel evaluation with wafer-cli.

January 20, 2026·Wafer Team
wafer-cli - GPU Superpowers for Your Coding Agent

If you're developing kernels using Claude Code, Codex, or another AI coding assistant, you've likely noticed a few things:

  • If you're on a machine without a GPU, development is clunky—needing to execute commands over SSH every time
  • Trace files are huge and hard for agents to navigate
  • Agents' knowledge cutoff means they frequently have outdated information about syntax, GPU specs, etc.

wafer-cli fixes this.


The Problem

AI coding assistants are increasingly responsible for more and more lines of code output, and kernels are no different. But users typically run their coding agents on their local development environment or laptop, or inside a sandbox—not attached to a GPU.

So you end up having to:

1. Write the kernel optimization

2. Sync file to GPU box

3. Run it, profile it, sync results back

4. Read the profiler output yourself, or paste a mess into the chat

5. Correct your agent about its out-of-date knowledge

6. Repeat...

And if you don't have your own always-on GPU, you're stuck paying for idle time or waiting for the startup time of GPU providers.


The Solution

wafer-cli is a command-line tool that gives your coding agent direct access to GPU development primitives. Install it, and your agent can:


1. Query GPU documentation with citations

bash
wafer corpus download cuda  # installs to a tempdir
wafer agent -t ask-docs --corpus cuda "What causes shared memory bank conflicts?"

Instead of "Claude thinks it remembers something about 32-byte accesses," you get answers grounded in the actual CUDA documentation, with citations you can verify.


2. Analyze performance traces

bash
wafer agent -t trace-analyze --args trace=./profile.ncu-rep "Why is this kernel slow?"

Point it at an NCU report, NSYS trace, or PyTorch profiler output. The agent reads the trace data and tells you where the bottlenecks are—memory bound, compute bound, warp divergence, whatever it finds.

Your agent can also query traces directly and write its own SQL:

bash
wafer nvidia perfetto query trace.json \
  "SELECT name, dur/1e6 as ms FROM slice WHERE cat='kernel' ORDER BY dur DESC LIMIT 10"

3. Evaluate kernels on remote GPUs

bash
wafer evaluate \
  --impl ./kernel.py \
  --reference ./reference.py \
  --test-cases ./tests.json \
  --target my-gpu \
  --benchmark

This runs your kernel on a real GPU, checks correctness against a reference implementation, and measures speedup. Your agent can use this in a loop: write code, evaluate, see results, iterate.

We support GPUMode and Kernelbench formats for specifying kernels and test cases.

wafer evaluate doesn't need a GPU on your local machine. wafer-cli handles the remote execution.


Get Started

bash
uv tool install wafer-cli
wafer login

# optionally install the skill
wafer skill install
# or
wafer skill install -t <claude/codex>

Now, your favorite CLI agent can simply use its own bash tool to interact with wafer-cli. Simply ask it to explore the CLI and it can figure out what to do for you.


Why This Matters

The future of kernel development is AI-assisted. But today's agents are flying blind—no access to real hardware, no grounding in documentation, no ability to profile and iterate.

wafer-cli bridges that gap. Your agent gets the same tools you use, automated and accessible from the command line.

Give your coding agent GPU superpowers.

bash
uv tool install wafer-cli

We'd love your feedback!

Would love to hear from the community. What features would help your workflow? What's missing? What's broken?

Reach out to us at emilio@wafer.ai or find us on Twitter/X.