copy and paste this google map to your website or blog!
Press copy button and paste into your blog or website.
(Please switch to 'HTML' mode when posting into your blog. Examples: WordPress Example, Blogger Example)
Pipeline metrics — ROCm Compute Profiler 3. 3. 1 documentation The values reported by these metrics will not change regardless of the execution mask of the wavefront Note that even if the execution mask is identically zero (meaning that no lanes are active) the instruction will still be counted, as CDNA accelerators still consider these instructions issued
MI300 and MI200 Series performance counters and metrics The compute unit counters are further classified into instruction mix, matrix fused multiply-add (FMA) operation counters, level counters, wavefront counters, wavefront cycle counters, and LDS counters
GPU architecture hardware specifications — ROCm Documentation Compute Units Number of compute units on the GPU Wavefront Size Amount of work-items that execute in parallel on a single compute unit This is equivalent to the warp size in HIP LDS The Local Data Share (LDS) is a low-latency, high-bandwidth scratch pad memory It is local to the compute units, shared by all work-items in a work group
AMD Instinct™ MI100 microarchitecture — ROCm Documentation A wavefront can occupy any number of VGPRs from 0 to 256, directly affecting occupancy; that is, the number of concurrently active wavefronts in the CU For instance, with 119 VGPRs used, only two wavefronts can be active in the CU at the same time
Using SPIR-V with rocPRIM — rocPRIM 4. 1. 0 Documentation Because the hardware wavefront size is not known at compilation-time, rocPRIM will add implicit dispatching where it is needed This provides a way for the same SPIR-V code to work hardware wavefront sizes of both 32 and 64
Hardware features — HIP 7. 1. 52802 Documentation The RDNA architectures feature an experimental compiler option called mwavefrontsize64, which determines the wavefront size for kernel code generation When this option is disabled, the native wavefront size of 32 is used, when enabled wavefront size 64 is used
Shader engine (SE) — ROCm Compute Profiler 3. 3. 1 documentation When analyzing these metrics, you should also take into account their achieved occupancy – such as wavefront occupancy A kernel may be occupancy limited by LDS usage, for example, but may still achieve high occupancy levels such that improving occupancy further may not improve performance