wafer-cli: GPU Superpowers for Your Coding Agent
Give your AI coding assistant direct access to GPU documentation, trace analysis, and remote kernel evaluation with wafer-cli.

If you're developing kernels using Claude Code, Codex, or another AI coding assistant, you've likely noticed a few things:
- If you're on a machine without a GPU, development is clunky—needing to execute commands over SSH every time
- Trace files are huge and hard for agents to navigate
- Agents' knowledge cutoff means they frequently have outdated information about syntax, GPU specs, etc.
wafer-cli fixes this.
The Problem
AI coding assistants are increasingly responsible for more and more lines of code output, and kernels are no different. But users typically run their coding agents on their local development environment or laptop, or inside a sandbox—not attached to a GPU.
So you end up having to:
1. Write the kernel optimization
2. Sync file to GPU box
3. Run it, profile it, sync results back
4. Read the profiler output yourself, or paste a mess into the chat
5. Correct your agent about its out-of-date knowledge
6. Repeat...
And if you don't have your own always-on GPU, you're stuck paying for idle time or waiting for the startup time of GPU providers.
The Solution
wafer-cli is a command-line tool that gives your coding agent direct access to GPU development primitives. Install it, and your agent can:
1. Query GPU documentation with citations
wafer corpus download cuda # installs to a tempdir
wafer agent -t ask-docs --corpus cuda "What causes shared memory bank conflicts?"Instead of "Claude thinks it remembers something about 32-byte accesses," you get answers grounded in the actual CUDA documentation, with citations you can verify.
2. Analyze performance traces
wafer agent -t trace-analyze --args trace=./profile.ncu-rep "Why is this kernel slow?"Point it at an NCU report, NSYS trace, or PyTorch profiler output. The agent reads the trace data and tells you where the bottlenecks are—memory bound, compute bound, warp divergence, whatever it finds.
Your agent can also query traces directly and write its own SQL:
wafer nvidia perfetto query trace.json \
"SELECT name, dur/1e6 as ms FROM slice WHERE cat='kernel' ORDER BY dur DESC LIMIT 10"3. Evaluate kernels on remote GPUs
wafer evaluate \
--impl ./kernel.py \
--reference ./reference.py \
--test-cases ./tests.json \
--target my-gpu \
--benchmarkThis runs your kernel on a real GPU, checks correctness against a reference implementation, and measures speedup. Your agent can use this in a loop: write code, evaluate, see results, iterate.
We support GPUMode and Kernelbench formats for specifying kernels and test cases.
wafer evaluate doesn't need a GPU on your local machine. wafer-cli handles the remote execution.
Get Started
uv tool install wafer-cli
wafer login
# optionally install the skill
wafer skill install
# or
wafer skill install -t <claude/codex>Now, your favorite CLI agent can simply use its own bash tool to interact with wafer-cli. Simply ask it to explore the CLI and it can figure out what to do for you.
Why This Matters
The future of kernel development is AI-assisted. But today's agents are flying blind—no access to real hardware, no grounding in documentation, no ability to profile and iterate.
wafer-cli bridges that gap. Your agent gets the same tools you use, automated and accessible from the command line.
Give your coding agent GPU superpowers.
uv tool install wafer-cliWe'd love your feedback!
Would love to hear from the community. What features would help your workflow? What's missing? What's broken?
Reach out to us at emilio@wafer.ai or find us on Twitter/X.