Trace Compare: Compare vLLM traces across platforms

Get accurate 1:1 kernel mappings across hardware providers. Compare large vLLM traces in seconds with clean prefill vs. decode separation.

February 10, 2026·Wafer Team
Trace Compare - Compare vLLM traces across platforms

TLDR

  • Our new tool allows you to compare large vLLM traces in seconds – two gigabyte large traces take sub-30s
  • Accurate 1:1 mapping of kernels across platforms (e.g. NVIDIA v.s. AMD). Find fusion opportunities and see why your model is slower compared to other platforms
  • Clean prefill vs. decode kernel separation

The Problem

vLLM traces contain valuable information: what kernel launched where, how long it took, etc. But they're large and complex. Even with perfetto, it's difficult to look at one and get the full picture behind the trace – let alone compare it against another.

For example, given one trace from NVIDIA and one trace from AMD, the current process of mapping equivalent kernels is manual and painfully slow, prefill vs. decode separation is nearly impossible, and within all of that, fusion opportunities are needles in a haystack. Therefore, it's hard to tell from the perspective of either platform where their kernels are falling behind.


The Solution

Wafer's Trace Compare solves this in seconds.

Input two traces and get a timeline of every kernel for the two platforms

See what kernel was called when, and the difference in performance. In prefill, NVIDIA consistently outperforms AMD's kernels.

Kernel Timeline

See exactly where decode starts and prefill ends

AMD starts its decode phase with many more sort kernels than NVIDIA does.

Prefill vs Decode

Find fusion opportunities immediately

In these two traces, NVIDIA fuses reduction into its attention kernels, whereas AMD does not.

Fusion opportunities

Try it out today

Install the Wafer extension and click 'Trace Compare'. Open any two vLLM traces and analyze to get your results. Results are more significant with traces from similar workloads.


Give Us Feedback

If there's something else that would make your kernel development faster, let us know.

Reach out at emilio@wafer.ai or find us on Twitter/X.