If you're developing kernels using Claude Code, Codex, or another AI coding assistant, you've likely noticed a few things:

If you're on a machine without a GPU, development is clunky—needing to execute commands over SSH every time
Trace files are huge and hard for agents to navigate
Agents' knowledge cutoff means they frequently have outdated information about syntax, GPU specs, etc.

wafer-ai CLI fixes this.

The Problem

AI coding assistants are increasingly responsible for more and more lines of code output, and kernels are no different. But users typically run their coding agents on their local development environment or laptop, or inside a sandbox—not attached to a GPU.

So you end up having to:

1. Write the kernel optimization

2. Sync file to GPU box

3. Run it, profile it, sync results back

4. Read the profiler output yourself, or paste a mess into the chat

5. Correct your agent about its out-of-date knowledge

6. Repeat...

And if you don't have your own always-on GPU, you're stuck paying for idle time or waiting for the startup time of GPU providers.

The Solution

wafer-ai CLI is a command-line tool that gives your coding agent direct access to GPU development primitives. With the right onboarding, your agent can:

1. Query GPU documentation with citations

bash

wafer agent -t docs "What causes shared memory bank conflicts?"

Instead of "Claude thinks it remembers something about 32-byte accesses," you get answers grounded in the actual CUDA documentation, with citations you can verify.

2. Analyze performance traces

bash

wafer agent -t trace-analyze --args trace=./profile.ncu-rep "Why is this kernel slow?"

Point it at an NCU report, NSYS trace, or PyTorch profiler output. The agent reads the trace data and tells you where the bottlenecks are—memory bound, compute bound, warp divergence, whatever it finds.

Your agent can also query traces directly and write its own SQL:

bash

wafer tool perfetto query trace.json \
  "SELECT name, dur/1e6 as ms FROM slice WHERE cat='kernel' ORDER BY dur DESC LIMIT 10"

3. Evaluate kernels on remote GPUs

bash

wafer tool eval \
  --impl ./kernel.py \
  --reference ./reference.py \
  --test-cases ./tests.json \
  --target my-gpu \
  --benchmark

This runs your kernel on a real GPU, checks correctness against a reference implementation, and measures speedup. Your agent can use this in a loop: write code, evaluate, see results, iterate.

We support GPUMode and Kernelbench formats for specifying kernels and test cases.

wafer tool eval doesn't need a GPU on your local machine. wafer-ai handles the remote execution.

Talk to the Team

We can help your team get set up and map wafer-ai workflows into your existing agent stack.

Why This Matters

The future of kernel development is AI-assisted. But today's agents are flying blind—no access to real hardware, no grounding in documentation, no ability to profile and iterate.

wafer-ai bridges that gap. Your agent gets the same tools you use, automated and accessible from the command line.

Give your coding agent GPU superpowers.

Book a Demo

We'd love your feedback!

Would love to hear from the community. What features would help your workflow? What's missing? What's broken?

Reach out to us at emilio@wafer.ai or find us on Twitter/X.