The Turing Architecture
Turing made its initial introduction back in August, 2018 at SIGGRAPH with the Quadro RTX series. With the introduction of these new professional series graphics cards, Nvidia revealed the changes coming with the Turing architecture. Major changes included were a new Streaming Multiprocessor (SM) design, Turing Tensor Cores, Ray Tracing (RT) Cores, Mesh Shading, Variable Rate Shading (VRS), Texture Space Shading (TSS), and Multi-View Rendering (MWV).
On the surface, this may seem like a big departure from their previous GPUs, in reality these features for the most part have existed already in the form of Volta, with the exception of RT cores. For the moment, and for the brevity sake of this article, an in depth dive into the Turing/Volta architecture wont be done here. Expect an article on that in the future, with a direct comparison of architectural features shared between the two. For now, lets do a quick overview of the new features and what they do.
New Streaming Multiprocessor – With Turing, Nvidia stepped away from their typical gaming chip design and moved to a more compute oriented design, which I am guessing is due to the heavy compute nature of Ray Tracing. Without going into a deep dive here, the big take away here is the addition of the Volta integer pipeline, Tensor FP32 accumulate instructions, the new RT Cores, the removal of the FP64 cores as well as increased and much lower latency cache. For comparison, here are block diagrams for Turing, Volta and Pascal.
Tensor Cores – Much like the tensor cores in Volta, these are used to speed up A.I. workloads by performing matrix math operations. With Turing, some changes were made when moving over from the Volta architecture with the addition of INT8 and INT4 precision operations. These accelerate workloads that don’t require a level of precision at FP16 or FP32. These new tensor cores enable the use of DLSS in applications.
Ray Tracing Cores – Here we have the most marketed, yet least understood and dare I say, most controversial update to Nvidia’s architecture. Contrary to what some people may think, the RT cores do not actually process ray tracing (shocker, I know). Keeping it short, these cores handle processing geometry information that the rays (that are being computed on the shaders) need in order to sample data more efficiently. They do this by reading arrayed geometry data stored in Bounding Volume Hierarchies, that reduce the complexity of handling raw geometry data. These are not required for ray tracing, unless Nvidia is hiding data from us concerning Volta. A more in depth look at how these work will come in a later article detailing the Turing architecture.
Mesh, Variable Rate and Texture Space Shading – Turing introduces a few new (or updated) shading techniques that can be leveraged by developers to increase performance. First, there is Mesh Shading. This is a new pipeline added with the architecture that allows geometry to processed and dispatched on the GPU as opposed to the CPU, allowing for more objects being drawn with less performance overhead. It also allows for dynamic, CPU independent LOD (mesh Level of Detail) management.
Variable Rate Shading (VRS) – Variable Rate Shading works by lowering the amount of shading done on particular areas of the scene, while attempting to maintain full resolution shading quality. There are three different methods in which this happens. Motion Adaptive Shading (MAS), which detects motion and reduces quality based on what is moving and the quality can’t be determined anyways. Then there is Content Adaptive Shading (CAS), which detects similarity in color via temporal or spatial means, and reduces shading based on the similarity of those pixels. Finally, there is Foveated Rendering, which relies on eye tracking to reduce the pixels in the area of the scene the the player is currently not focusing on. To date, there is only one implementation on the market of VRS and that is in the recently updated Wolfenstein 2.
Other Features – Major features aside, Turing brings with it a number of other updates.
- GDDR6 – The latest memory standard allowing for 14Gbps vs 11Gbps (or 12Gbps on select models) on previous generations with GDDR5X.
- NVLink – For the first time, NVLink comes to consumers replacing the old SLI fingers. This allows for memory coherency between devices at up to 100GBps bidirectional bandwidth.
- Improved Video and Display Engine -Adds support for DP 1.4a (with DSC), handles HDR processing natively (reducing performance loss) and adds tone mapping to the HDR pipeline.
- USB-C Virtual Link – New Standard that allows single port use for VR technologies. Also charges cell phones 😛
- Further Improved Delta Compression