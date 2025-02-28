AMD today is announcing its new RDNA 4 GPU architecture that will be powering its new Radeon RX 9070 series desktop GPUs. The Santa Clara company has put up some fairly tall claims in terms of performance and if you are wondering how AMD managed to achieve it, the company says it is all thanks to its vision of making "an architecture built for gaming".

The Radeon team says that the performance per Compute Unit (CU) per clock on RDNA 4 sees an uplift of nearly two times in rasterization compared to RDNA 2 (RX 6000 series) and by nearly 40% vs RDNA 3 (RX 7000 series).

Ray tracing sees an even bigger improvement as the 3rd Gen RDNA 4 ray accelerators claim to be about 2.4x or 140% faster than RDNA 2 and over 70% faster than RDNA 3. Thus, AMD is showing some impressive numbers here in terms of IPC (instructions per cycle).

However, AMD understands that raster and ray tracing are not everything in gaming these days as AI-based upscaling technology is also fairly common now. As such, Team Red's new GPUs are said to see the biggest gain in AI and ML processing. RDNA 4 promises up to four times faster FP16 dense matrix output vs RDNA 2, and against RDNA 3, it promises two times better performance.

AMD is moving back to a monolithic design with RDNA 4 as the new 9070 series cards are based on the TSMC 4nm process. There are in total 53.9 billion transistors across a die size of 356.5 sq.mm. area.

AMD has shared a high-level logical layout of the RX 9070 XT die which has 64 CUs in total arranged across four shader engines, where each shader engine comprises eight Workgroup processors. If you recall, each of AMD's RDNA Workgroup processors or WGP houses a couple of compute units.

With its new 3rd Gen Ray (RT) accelerators, RDNA 4 brings improvements to both ray traversal as well as shading performance.

The RX 9070 series is said to pack double the ray intersection rate. For those unfamiliar, the intersection rate is the process of determining how fast rays meeting a surface or object can be calculated on a GPU.

The new architecture also features significant improvements to BVH (Bounding Volume Hierarchy) primitive node compression to reduce memory bandwidth requirements. Along with that, RDNA 4 moves from a BVH4 to BVH8 implementation and this has been done to complement the doubling intersection rate. If you are wondering, BVH is a technique used to accelerate ray tracing performance.

And it does not stop there as false-positive intersections are also culled thanks to Oriented Bounded Boxes (OBB). AMD says that this new technology helps to boost ray traversal performance by around 10%, though it also depends on the source geometry. Overall, AMD says ray traversal IPC is twice as good on RDNA 4 compared to RDNA 3.

Following ray traversal, AMD has also highlighted the shading enhancements on its new GPUs. First up, the latency of memory requests is significantly reduced with a new Out of Order Memory returns technology. The technique makes out-of-order shader executions possible even in the case of cache misses and consequent wave request delays.

The new design also brings dynamic vector general purpose register (VGPR) management and allocation thus improving ray occupancy by lowering the number of idle VGPRs. This should immensely help ray shading since it typically requires more registers than ray traversal.

For some reason, AMD has not provided any details about the rasterization advancements as the focus this time is clearly on ray tracing and AI. In general, though, it does note that its RDNA 4 CUs feature "enhanced memory subsystem and improved scalar units" as well as "much higher clock speeds."

You can find the entire coverage of our RDNA 4 (RX 9070 series) at this link.