
Profile-Guided GPU Kernel Optimization
How adding profiling tools to our CLI helped an agent break through a theory-based optimization plateau, achieving 11.65x speedup on the Kimi Delta Attention kernel.

How adding profiling tools to our CLI helped an agent break through a theory-based optimization plateau, achieving 11.65x speedup on the Kimi Delta Attention kernel.

We used an AI agent to optimize AMD's topk_sigmoid kernel, achieving a 9x speedup over PyTorch. Here's exactly how our agent did it

How a fused kernel claiming 104x speedup passed our correctness checks while reading garbage memory, and the determinism check that catches it.

LLM-generated kernels are all the rage right now. We used frontier AI models to write HIP kernels for KernelBench and ran them on MI300Xs. Which ones performed the best?

Give your AI coding assistant direct access to GPU documentation, trace analysis, and remote kernel evaluation with wafer-cli.

The GPU documentation tool that thousands of engineers loved in our IDE extension is now available as a standalone web app.

Profile AMD GPUs directly in VS Code and Cursor. View hardware metrics, roofline analysis, and kernel stats — all without leaving your editor.

Open Chrome trace JSON files directly in your IDE with full Perfetto functionality — timeline, flamegraphs, SQL, and metrics.

Wafer is the GPU development stack that lives inside your editor: profiling (NCU), compiler explorer, and enhanced GPU docs.

As the AI hardware ecosystem rapidly expands, choosing the right accelerator has become increasingly complex. We're excited to introduce Chip Benchmark, an open-source benchmarking suite purpose-built to evaluate the performance of open-weight LLMs across diverse hardware platforms.

Large language models are driving a surge in inference workloads. While the AI community often gravitates toward more well-known GPUs, AMD's MI300X quietly stands out. Equipped with 192 GB of HBM3 and memory bandwidth of 5.3 TB/s, we explore how targeted optimization and quantization can unlock its potential.