Titan V vs Titan RTX Raytracing Performance
What difference do RT cores actually make
When Battlefield V launched, we tested the performance of the Titan V vs the 2080 Ti to see how far RT cores really got you. Unfortunately the small test that we did run never saw the light of day outside of a few comments made on certain tech sites, mainly due to life. With the holidays, work schedules and the Titan RTX review, it got pushed onto the back burner. Little did we know that it would somehow evolve into a semi controversy in the tech world. Given what we know about DXR, Turing and more importantly the Titan V, why did we ever question it? For those that aren't following the story we'll dive into it.
A couple of days ago on another forum, users were cross comparing results of Titan V with the 2080 Ti in Battlefield V. Lets stop here and address that as the first problem in the chain of events. Testing performance on two different systems with two different people orchestrating the flow of the benchmark procedure is a setup for failure. In games like Battlefield where there isn't a built in benchmark results are going to be skewed more so than they would in a closed environment due to operational variables. We get it though, we've done the same internally where we do quick comparisons of performance with someone to see where it lies, but it should never be reported on by the tech media, unless they are verifying results themselves. There are simply too many variables for that to be a credible analysis.
Here are a few quotes that I would like pull out of the original thread that started this discussion and point out their merits and flaws.
As pointed out in our Titan RTX review, RT cores do not enabled nor execute the RT functionality. They simply accelerate the processing of BVHs. They are not a requirement for ray tracing, and the 'shaders' themselves aren't just shading the rays, they are calculating the rays as well.
This statement is mostly correct and pretty well informed. BVH processing is very intensive on GPUs that do not have RT cores. This is the primary benefit of Turing when it comes to ray tracing of any kind (be it hybrid or production). Ray tracing is again, processed on the ALUs (or shaders) with the RT cores being an accelerator in the middle of the process. The problem with his statement is that other architectures do not support real time hybrid rendering or accelerated ray tracing at all. This is reserved to Volta and Turing. We'll get into that more later.
So, we have clarified that Volta is ray tracing compatible as its been tested, and to the shock of many, it worked. The fact that many tech presses picked up this story appears to me to be because of a few points. One being that they thought that ray tracing wasn't possible on Volta, or two they thought that the performance was going be vastly different, and according to this source it wasn't. Lets look into the first part of that though.
When Microsoft originally announced DXR they did so while remaining neutral (for the most part) on the hardware front. DXR was not to be tied to any particular hardware brand as it is an industry standard and needs to be able to run on any GPU that can run DirectX going forward. There were to be no walls here. So obviously, hardware limitations were mentioned but it wasn't made very clear that at the time there was only one GPU that could run true DXR, GV100. GV100 is the chip that powers the Titan V. If you've been following DXR then you may have also heard of the compatibility layer. This worked in the opposite direction and allowed developers to test DXR code on non accelerated hardware (Pascal) through emulation. I tested this myself on a few GPUs to see the performance of early DXR samples.
So how were people supposed to know that DXR ran on Volta if Microsoft didn't explicitly state that it would (though, deeper in documentation they did)? Thats where Nvidia comes in. Nvidia in an announcement that happened at the same time as the DXR announcement, unvelied their RTX branding. In this announcement they dove into how they were implementing DXR into their own software stack and were going to build upon it. Ultimately what it turns into is an extension of software built on the DXR foundation (the DirectX side of it) which contains a number of software approaches to raytracing and later DLSS. In this announcement, Nvidia states that currently only Volta GPUs are capable. Link to article here
"At GDC 2018, NVIDIA unveiled RTX, a high-performance implementation that will power all ray tracing APIs supported by NVIDIA on Volta and future GPUs. At the same event, Microsoft announced the integration of ray tracing as a first-class citizen into their industry standard DirectX API."
Now we have established that Volta always had DXR support, what of the performance difference? 'In Nvidias own demonstrations it was shown that Volta was vastly inferior to Turing when it came to raytracing. How suddenly in the only publicly available commercial DXR implementation is Volta keeping up with Turing.' (not a direct quote but an argument I have seen on the web by numerous people at this point). Lets discard the performance metric for a moment, well touch on that again later. For now, lets look at the actual test that we are measuring. Obviously, the test is Battlefield V. Battlefield V being the first game to implement DXR into a commercial video game. This is where we have to look into what exactly DXR is. Here are a few quotes from the Microsoft DXR brief
"DXR will initially be used to supplement current rendering techniques such as screen space reflections, for example, to fill in data from geometry that’s either occluded or off-screen."
"This means that it’s now possible for developers to build games that use rasterization for some of its rendering and raytracing to be used for the rest."
"Today, we are introducing a feature to DirectX 12 that will bridge the gap between the rasterization techniques employed by games today, and the full 3D effects of tomorrow."
Reading those quotes or reading the announcement (ignoring all RTX briefs at this point) we can come to the conclusion that DXR in its current state and its very first implementation is simply a hybrid implementation of raytracing into video games. At this state, it is only being used to supplement existing technologies to increase fidelity. Why this is important to know is it puts into perspective the workload that you are dealing with in this benchmark, and what affects fixed function accelerators are going to have on it.
RT cores are accelerators of BVH structures. They make the raytracing process more efficient by traversing those structures faster than shaders would be able to do on their own. That is their only function. Now, given what we know about the hybrid implementation of raytracing into this game, RT cores are only going to be used for a small portion of the workload that each frame is being produced from. Battlefield V is a game that is still produced primarily by rasterization techniques, with mostly compute oriented shading. Raytracing is only a small part of the equation. Only portions of the games scene are affected by rays, and now with the latest update rays wont be cast onto objects at all if they have no affect on them in the first place. This further reduces the dependency of the game on RT fixed function hardware. So, in a situation where most of your workload is still rasterized, a benchmark of the Titan V vs Turing isn't going to show as dramatic of a result considering the Titan V is still a beast of a GPU compared to Turing, and only part of the workload is going to favor Turing over Pascal. If this was a full fledged raytraced scene that had support for RT cores, the tables would turn on Volta very quickly.
Finally, on to the performance. There again is a problem with taking results from forums posted by multiple people over multiple systems and uncontrolled environments. To help clarify this situation and paint a much clearer picture we tested the game on a closed system (same hardware, same person), found a good scene within Battlefield V in the single player campaign (to reduce even more variables) and pitted the two Titans against each other. For the scene we picked a map the resembled the workload of Rotterdam as best as possible. We couldn't find a replication of 'White House', but instead found a section with a good amount of raytracing going on to show the stretch of which RT cores can accelerate the process and give a significant performance boost over Volta. If you do not want to watch the video, I've created charts to show the difference in performance between the two. Unfortunately for this test, I have no way of knowing the exact number of rays being cast nor the ability to query the RT cores to know how much we are saving, but the numbers speak for themselves.
Using this graph, we can see that on Ultra, the Titan RTX is 45% faster at 4k, and on low the Titan RTX is 28% faster. If you have been following what the developers have said about their optimizations with DXR, we know that Ultra is going to cast and have more affected materials than Low. These additional rays and affects are going to trigger more use of the RT cores, and this is where we see the gap grow even further between the two. The more rays that get cast, the bigger the difference in performance. The performance difference these RT cores give is also going to scale more with resolution. DXR casts rays based on the amount of pixels in your resolution. 'Radolov' over at YouTube was nice enough to dig up performance settings for DXR in Battlefield V.
Low: 0.9 smoothness cut-off and 15.0 percent of screen resolution as maximum ray count.
Med: 0.9 smoothness cut-off and 23.3 percent of screen resolution as maximum ray count.
High: 0.5 smoothness cut-off and 31.6 percent of screen resolution as maximum ray count.
Ultra: 0.5 smoothness cut-off and 40.0 percent of screen resolution as maximum ray count
In conclusion this article was written to serve to primary purposes. The first being to correct some misconceptions about the performance of Turing vs Volta in a raytracing scenario. There was a misconception made in a forum post that unfortunately other press websites picked up on without doing the correct research into the topic to do their job as a reporters, educate people. This one particular quote made me go this in depth in something was otherwise going to be a simple comparison.
"It is a bit of a remarkable story really, but users have enabled RTX mode on a Nvidia Titan V, which works quite well and performs as fast as the RTX 2080 Ti. Titan V, however, is Volta, and Volta does not have any RT cores." - Hilbert @ Guru3D
There is simply so much wrong with that opening statement that I simply couldn't ignore it. You can read his article here and see his take on it. While remaining as professional as possible I have to say I am not impressed. The obligatory 'grain of salt' comment should not be a shield against presenting as much factual information as possible, and shouldn't enable you to present something like this as clickbait. Don't take this as a swing at Hilbert or Guru3D, I am simply trying to show an example and a growing problem within the tech world, maybe even journalism in general. Findings such as this should be researched more with more insight and thought in order to steer your audience (those who have trusted you enough to read your writings for insight) in the correct direction. We all make mistakes though, I have done it, I will do it again, and I will do it some more after that. Its human nature. What isn't OK about it, is laziness being the reason, commenting on things you don't know but presenting yourself as someone of experience, or outright lying (not saying he did the last two, just things I have noticed a lot lately in tech journalism).
Second reason of this article was to attempt to educate or enlighten anyone that is reading this and has an interest in the technology. This wasn't a deep dive, but more an analytical approach to debunking things like this and putting them into the correct light. I hope that I was able to present this data in a way that gave some insight on how DXR/RTX work, the history so far with it, and how fixed function hardware can help accelerate it in games.
I do not wish this article to be taken as a justification for Turing as a purchase, or for anyone to say that Volta is anything less than it is (A beast of a GPU). This was simply showing two GPUs designed for different workloads in a test that favors Turing. What you can gather from this is that Nvidia isn't 'Ripping' anyone off with features on Turing that work on Volta. Seriously, if you think that paying half as much for a GPU to be 45% faster in this workload was misleading well I don't know what to tell you. There was nothing misleading about the way that Nvidia presented DXR / RTX capabilities of Volta and Turing. It was stated from the very beginning that Volta would be able to raytrace, and that Turings RT cores accelerate raytracing.
Thats it for this one guys. I hope that you got something out of this article. I apologize if this isn't the best written piece you've seen in a while, I kind of just sat down and flushed it out real quick after doing some thinking. If there is something I misspoke on or something I am missing you would like to see added in, please let me know in the comments or send us an email through the contact page.
* Update *
I reached out to Nvidia to confirm the reason that Volta is so much faster than Pascal. My suspicions were confirmed that the cache differences between the two arches (which Volta and Turing share) is what makes it so much faster. Volta has about 4x lower latency on the cache with more bandwidth and twice the low level cache. Also added 1440p numbers.