Adding to Cart…
Licensing Agreement | Terms of Service | Privacy Policy | EULA
© 2024 Daz Productions Inc. All Rights Reserved.You currently have no notifications.
Licensing Agreement | Terms of Service | Privacy Policy | EULA
© 2024 Daz Productions Inc. All Rights Reserved.
Comments
So some interesting info.
4.16 wins yet again. It certainly isn't a fluke. They still have not totally fixed the problem.
Also, two 3090s can indeed beat a single 4090. Though these are Kingpins, so they can go a touch faster. But even without that, just looking at my 3090 which can top 20 iterations, two of them would still top a single 4090 in the same version of DS.
It also seems to show the trend that the faster cards suffer the most. The gap between 4.16 and later is roughly around a 3060's iteration rate. That is such a staggering loss.
Eventually I am going to try some more taxing scenes to compare how the performance holds.
The 3090 Kingpins at stock settings max out at 1995 MHz, so I don't think they are that much faster than most other 3090s. I did try the benchmark at 2200 MHz with the VRAM overclocked by 1200 MHz and it only reduce the benchmark render time by about 2.5 to 3.0 seconds. However, those results probably aren't relevant to most 3090 cards.
I think the 4090 has more protentional in other rendering software. In the Port Royal ray tracing benchmark it takes a 4090 cooled by liquid nitrogen to beat two 3090s. However in the full path traced benchmarks, an air cooled or water cooled 4090 will beat two 3090s.
Here is a link to Vray's benchmark list.
https://benchmark.chaos.com/v5/vray-gpu-rtx?gpu=;1&index=1&ordering=desc&by=avg&my-scores-only=false
This is a bit confusing, but I think we need to remove the outliers that might have been crazy clocked. They have a count of 188 benchmarks of 4090s hitting 5100 points. The peak is 5809, but only with only 5 benchmarks at that number, I throw that out.
The 3090 has one score at 2850, but again, that is just one bench. Scroll down and you will find 468 benchmarks that scored 2613. So the 4090 doesn't quite double the 3090, but at 95% it sure is close. If you take the peak scores, you actually have over 103%.
For Octane, here are 4 different test scenes compared by Petapixel.
The Interior is more Daz like as it is a living room.
Blender benchmarks also vary on scene. Toms ran the 4090 through its paces and their review has all of this.
https://www.tomshardware.com/reviews/nvidia-geforce-rtx-4090-review/6
In these benchmarks the gap varies by a fair bit. The simpler scenes have the 4090 easily doubling the 3090 in two of them. The Junkshop is the most complicated of the trio, and here the difference is 70%. That is more like what we get in our bench. I am not sure if that scene is more complex than ours or not.
The trend is that the more shader complex scenes slip below doubling performance. Our scene really isn't complex, though. We have a few spheres in a box. The most complex aspect by far being the Genesis 8 character, which I assigned the surface settings to (though Raydiant swapped the bump and spec maps to G8F's when he made it, I used G3F's bumps because I felt they were more pleasing.) Her skin has high translucency, dual lobe spec gloss, and chromatic sss. I felt it was important for something in the scene to have all of these elements.
I know this is an older thread, but I recently (this week) upgraded from dual GTX960s (4G) to a single RTX3060 12G and am SOOO very happy!
System Configuration
System/Motherboard: MSI Intel Skylake Z170A Gaming M7 Motherboard
CPU: Intel(R) Core(TM) i7-6700L CPU @ 4.00 GHz 4.00 GHz (Stock) (Watercooled)
GPU: Nvidia RTX 3060 12G (Stock)
System Memory: Corsair Vengeance ??? / 32G
OS Drive: 150G SATA
Asset Drive: 1T SATA
Power Supply: 750W
Operating System: Windows 7 Pro SP1 (6.1.7601 Build 7601)
Nvidia Drivers Version: 474.11
Daz Studio Version: 4.21.0.5 Pro (64-bit)
Benchmark Results
DAZ_STATS
2023-01-21 22:21:28.220 [INFO] :: Total Rendering Time: 4 minutes 48.99 seconds
IRAY_STATS
2023-01-21 22:21:42.864 Iray [INFO] - IRAY:RENDER :: 1.0 IRAY rend info : CUDA device 0 (NVIDIA GeForce RTX 3060): 1800 iterations, 2.520s init, 283.612s render
Iteration Rate: (1800 / 283.612) iterations per second = 6.347
Loading Time: ((0 * 3600 + 4 * 60 + 48.99) - 283.612) seconds = 5.378
4.21.1.29 beta
Stock 4090 - water cooled
25.01 iterations per second
+175 core / 750 mem with stock power target
26.51 iterations per second
No change from last beta
Nice job getting your miles out of the 960s and grats on the upgrade!
Important log entry, thanks. I will hold off on updating to 4.21.1.29 if there is no change for ADA. 25 Iterations per second is a solid baseline for the 4090 for where we are today (or about 31 with 4.16)
Any guesses @ RTX6000 ADA? I should have numbers in this weekend.
Well, the 4090 is more cut down than the 3090 was. The 3090 was so close to the A6000 that its higher clock speeds allowed it to easily overcome the small drop in cores and beat the A6000. But the 6000 ADA has a 1792 core advantage over the 4090. That's an 11% gap. I can't seem to find clock speeds listed online, but the power is set at 300 Watts instead of the 450 W of the 4090. But we also know the 4090 isn't running remotely close to its TDP for Iray. If the 6000 ADA is reasonably close to the 4090 in clocks it should score a win. It might be close, though.
A lot hinges on just what the 4090 is doing with Iray. We see it drops to around 280 Watts with Iray. Note that is less than the 6000ADA's total rated power. How much power will the 6000 use with Iray? I am going to guess about 230 Watts. Because I believe the 6000 ADA will target its full TDP of 300 Watts with Iray (which I believe the 4090 is targeting 350W,) so you subtract around 70 W from the PCIe bus being under utilized during GPU rendering. That comes out to 230 Watts. So the cards will be roughly 50 Watts apart. Will that 50 Watts be enough to overcome the core gap? Maybe not.
I would suggest actually comparing the outputted renders, particularly with regards to their filesize in PNG format.
The iteration rate may have changed, but it's possible that the algorithm has been updated in a way that clears the render up more per sample.
PNG format doesn't compress noise efficiently, and a more complete render will generally have a lower filesize, so it's at least a somewhat objective measurement of the "cleanness" of the image (and less likely to be an unknown in the way that Iray's convergence percentage method might have changed - if nothing else, you could recompress both versions in the same software to be sure they're compressed to equal parameters).
~~~~~
My own experimentation has shown that in some cases, while the Caustic Samper roughly halves iteration rate, the render can be clearer in the same time, because each iteration clears up the render much more. This includes in scenes with no apparent caustics (I generally use it in poorly lit scenes, as it seems to improve light propagation even when caustics are not involved).
Looking at some cut-outs from this render I did last month...
... but before any resizing, denoising or post-processing:
This render has the caustic sampler on, rendered to 2048 iterations:https://i.imgur.com/hEwJhiu.png
And this one has it off, rendered to 4096 iterations: https://i.imgur.com/xPJRjxH.png
I've lost the exact numbers for the total time, but the image with the sampler on was actually a hair faster to complete.
(And yes, I screwed up and had SubD off on the hair in the first of these tests, but I believe the effect of that on the result/speed is negligible.)
However, despite the halved sample count, to my eye, the render with the sampler on is significantly less noisy, something that the PNG filesizes somewhat support (these two previews are 787 and 815 KB respectively, although the difference in light levels means that has to be taken with a pinch of salt).
Iterations are not always equal. Here one setting change means that the iteration rate halves, but each iteration does more than twice the work, so in the same time the render is cleaner.
If we're making comparisons across Iray versions and proclaiming that the iteration rate is slower, we also need to know whether those same 1800 iterations are actually equal.
I cannot - cannot emphasize this enough. There is a reason why I titled this benchmarking thread "Rendering Hardware Benchmrking" rather than "Rendering Software Benchmarking". Because the base unit of work (Iray Iterations) from which all of the performance calulations here are derived only stays a constant if you are keeping to the same version of Iray.
To turn it into a car analogy - imagine a vehicle's distance traveled is cumulative render quality, the rate of its tires rotating is rendering iteration rate, and its engine speed is rendering hardware performance. Iray is like a car where the designers are constantly experimenting with different wheel diameters. All else being equal, bigger wheels always mean lower iteration rates and vise-versa. But distance traveled (cumulative render quality) remains the same. Different versions of Iray taking different amounts of time to accomplish the same number of iterations is a completely meaningless statistic - unless there is also a non-corresponding change in cumulative render quality being observed at the same time.
Imo that is an endeavor of study that is best left to people familiar with the use of 3rd party SSIM tools (since that seems to be the best way to get around the classic subjectivity trap that comes whenever you see the word quality) using a completely different benchmarking procedure. Ideally studied in a separate forum thread than this one since there is too much confusing crossover betweeen trying to do two very interrelated things like that (benchmarking different hardware versions on the same software and different software versions on the same hardware) in the same place. It's really two completely separate (but both interesting) topics.
I have tested the caustic sampler as well. I stand by what I said. The caustic sampler is not used in the benchmark. The benchmark also is set to an iteration cap, so the parameters would need to be changed to include such this.
But I entertained the idea that the caustic sampler is reducing noise. You compared 4.21 to itself. Lets compare 4.21 to 4.16, and try to compare the pics.
I went back to the benchmark and ran some tests between 4.21.1.26 and 4.16.0.3. I ran two tests with each, one with caustics, and one with caustics off. I also saved the pics.
4.21.1.26 Caustics ON
2023-01-26 02:37:23.006 [INFO] :: Total Rendering Time: 3 minutes 31.84 seconds
2023-01-26 02:37:53.061 Iray [INFO] - IRAY:RENDER :: 1.0 IRAY rend info : CUDA device 0 (NVIDIA GeForce RTX 3090): 1307 iterations, 1.763s init, 208.007s render
2023-01-26 02:37:53.061 Iray [INFO] - IRAY:RENDER :: 1.0 IRAY rend info : CUDA device 1 (NVIDIA GeForce RTX 3060): 493 iterations, 1.292s init, 207.352s render
4.21 Caustics OFF
2023-01-26 02:42:28.423 [INFO] :: Total Rendering Time: 1 minutes 21.14 seconds
2023-01-26 02:42:41.237 Iray [INFO] - IRAY:RENDER :: 1.0 IRAY rend info : CUDA device 0 (NVIDIA GeForce RTX 3090): 1297 iterations, 0.767s init, 78.724s render
2023-01-26 02:42:41.237 Iray [INFO] - IRAY:RENDER :: 1.0 IRAY rend info : CUDA device 1 (NVIDIA GeForce RTX 3060): 503 iterations, 0.729s init, 78.952s render
4.16.0.3 Caustics ON
2023-01-26 02:51:55.665 Total Rendering Time: 2 minutes 14.55 seconds
2023-01-26 02:52:00.090 Iray [INFO] - IRAY:RENDER :: 1.0 IRAY rend info : CUDA device 0 (NVIDIA GeForce RTX 3090): 1298 iterations, 1.191s init, 131.917s render
2023-01-26 02:52:00.090 Iray [INFO] - IRAY:RENDER :: 1.0 IRAY rend info : CUDA device 1 (NVIDIA GeForce RTX 3060): 502 iterations, 1.202s init, 131.855s render
4.16 Caustics OFF
2023-01-26 02:54:51.627 Total Rendering Time: 1 minutes 8.80 seconds
2023-01-26 02:55:00.671 Iray [INFO] - IRAY:RENDER :: 1.0 IRAY rend info : CUDA device 0 (NVIDIA GeForce RTX 3090): 1285 iterations, 1.107s init, 66.252s render
2023-01-26 02:55:00.671 Iray [INFO] - IRAY:RENDER :: 1.0 IRAY rend info : CUDA device 1 (NVIDIA GeForce RTX 3060): 515 iterations, 0.985s init, 66.052s render
Even with caustics enabled, 4.16 runs faster than 4.21. Lets look at some results.
4.21 Caustics ON
421 Caustics OFF
Hmm, so comparing 4.21 to itself, it does seem like something is going on. It took longer to render with caustics, but it does look cleaner.
416 Caustics ON
416 Caustics OFF
4.16 is much faster, but there is some noise. So how do we determine this? Well, 4.21 with caustics ON took 204 seconds to reach the 1800 iteration count. So I will remove the iteration cap and in its place add a 200 second cap. Sound fair?
With these parameters, here are the numbers:
416 caustics off no iteration cap, 200 sec limit
2023-01-26 03:00:50.730 Total Rendering Time: 3 minutes 22.15 seconds
2023-01-26 03:01:16.215 Iray [INFO] - IRAY:RENDER :: 1.0 IRAY rend info : CUDA device 0 (NVIDIA GeForce RTX 3090): 3929 iterations, 1.368s init, 199.270s render
2023-01-26 03:01:16.215 Iray [INFO] - IRAY:RENDER :: 1.0 IRAY rend info : CUDA device 1 (NVIDIA GeForce RTX 3060): 1584 iterations, 1.422s init, 199.226s render
Um...that is 5,513 iterations. Daz 4.16 can run 5,513 iteration in 200 seconds, where Daz 4.21 only did 1800. Of course, caustics are off in 4.16. So lets take a peek!
Oh my, look at that convergence. The noise is vastly reduced, and if you compare this to 4.21 with caustics, you will see that the noise is actually quite a bit less in this image. Look at the green ball, and the noise on its bottom left edge, compared to the clean look here in 4.16. Look at that MIRROR BALL, it is almost noise free. Anybody want to keep defending 4.21?
Ok, so what if we want that sexy caustics sampler? Lets do this again.
416 caustics ON no iteration cap, 200 sec limit
2023-01-26 03:17:01.290 Total Rendering Time: 3 minutes 22.0 seconds
2023-01-26 03:17:13.004 Iray [INFO] - IRAY:RENDER :: 1.0 IRAY rend info : CUDA device 0 (NVIDIA GeForce RTX 3090): 1988 iterations, 1.081s init, 199.504s render
2023-01-26 03:17:13.005 Iray [INFO] - IRAY:RENDER :: 1.0 IRAY rend info : CUDA device 1 (NVIDIA GeForce RTX 3060): 788 iterations, 0.940s init, 199.445s render
Now it is starting look unfair for 4.21. With caustics enabled, 4.16 ran 2,776 iterations in 200 seconds. Remember, 4.21 ran 1800 in 204 seconds. So what does the image look like?
It is not quite as clean as the no caustics image, but it is still cleaner than 4.21. Again, just look at the edge of the green ball.
You can also see less noise in the girl and the image is just cleaner. So the floor has something going on. The caustics in 4.21 have changed, so the image between the two is different. But this is still a cleaner pic.
So once again, I cannot emphasize enough how much slower 4.21 is than 4.16 in every possible capacity. If you want more iterations, you can get more iterations. Without caustics you can get crazy iterations, literally running circles around 4.21. The hardware is used by the software. At some point the software has to keep up. It hasn't. And not only is it slower, but they managed to break a laundry list of features. These do bare meaning to users, because these users are artists. Artists value consistancy as much as speed. Daz users have to keep trying to replicate what they did in previous versions to achieve the same look. Some don't care, but some who have on going projects that depend on this consistancy really need that.
If we want to talk cars, Daz 4.21 is a Ford Pinto. It doesn't matter how much they try to pimp this ride. It is still slower, guys. Aside from that, Iray has actually not constantly changed the iteration parameters. This is really a recent thing. For several years, 4.8 to 4.12, Iray was the same speed in these tests, no matter what changed. Then something happened that appears to be a driver bug more than anything else (rolling back drivers recovered speed in the SAME version of Daz.) Then 4.14 got a boost from normal maps, and the speed stayed mostly the same until 4.20. Besides that, the differences were not as extreme. 4.20 reduced speed by so much that it HAD to be talked about. 4.21 does very little to improve that.
It seems like the Iray dev team is very much focused on caustics.
Well, I think you're agreeing with my general point, but to restate myself more bluntly, comparing version to version iteration rates should never be seen as more than a (very approximate) conversion factor to compare to benchmarks for a card that hasn't been tested on a specific version.
While some versions of Iray may take different amounts of time to get to a specific number of iterations, that doesn't inherently mean that Iray is slower; each Iray version will almost certainly need a different number of samples to reach the same visual quality.
EDIT: (Nor, indeed, should we expect exactly equivalent improvements in every scene; while *this* scene might have a slower iteration in one version than another, changes to Iray might benefit a different scene).
It is funny how the Nvidia Control Panel spells out the "Ada Generation", literally:
What's the IO like on it? I've been dying to know whether the rumors are true that NVLink is truly absent from the entire Ada stack.
@outrider42, I want to try out 4.16 on a heavy render and run both it and 4.21 for like 30 minutes and see the differences in the end. Do I need to do something specific to get it to run on a 4090 or will it run out of the box?
Along with this, has anyone out there have any knowledge on whether running the 4090 in a PCI-E v3 slot/MB VS a newer MB with PCIE-4 has any bearing on actual RENDER times? The PCI-E v4 slot is 4x faster throughput, but I'm wondering if that really doesnt have any bearing on render times since the rendering is on-board the GPU and should be affected by bus transfer speed to the CPU.
Anyone?
Yes, it can work pretty much out of the box. Several people have already posted 4090 times with Daz 4.15/16.
It shouldn't be an issue. PCIe is barely utilized by Iray. If there is any performance hit, it would be so small that it be pointless to worry about. There have been tests that prove this, too.
Some other GPU render engines might make of PCIe, but they only do so with "out of core" rendering, which Daz Iray does not do. Octane has out of core rendering, and they have the data proving that yes, limited PCIe bandwidth can hurt render times. But you also need to know what out of core means. Out of core is when you make a scene so large it doesn't fit on the GPU VRAM, and the render engine basically uses the CPU to help by using system RAM. The GPU still runs (again, Iray does NOT have this feature). There is a huge performance hit for using out of core regardless of how fast PCIe is, because the data has such a long distance to travel. It is why Iray doesn't bother with the feature, though IMO it would still be better then CPU only.
I point that out only if you plan on using other render engines. Again, Iray does not have this feature, so PCIe has no bearing on its performance.
System/Motherboard: SuperMicro X12
CPU: 2x Xeon Gold 6348
GPU: A6000 GPU + RTX6000 ADA Generation
System Memory: 512 GB DDR4 ECC @ 3200 MHz
OS Drive: SK hynix Platinum P41 2TB PCIe NVMe Gen4
Asset Drive: 256 GB RAM DRIVE
Operating System: Win 11 Pro
Nvidia Drivers Version: 528.24
Tests are with Daz 4.15.0.2 and 4.21.1.26 beta. Load times were in the mid 4 seconds.
In short, the 4090 wins the speed contest here.
*Scores above are from sample group 2 & 6 below with both cards +1100. .
Again, multi-Core CPU performance has improved in absolute terms, while ADA steps ahead only in terms of proportion vs Ampere. Both cards are still much slower in 4.21. I might be able to get higher scores with both GPUs and no CPU, I will test soon.
From the package: Nvidia RTX 6000, Part# VCNRTX6000ADA. It is absent NVLink connection. Here is the thing that bothers me with the pic below. If you pay $7000 for a GPU, a couple gold screws are in order.
4X Displayport.This package came bundled with a Displayport to HDMI adapter as well.
Onto the discussion about the NVLink and PCIe bus, I have read that the PCIe 4 and 5 bus speeds are nearly as fast or faster than a NVLink bridge. With that in mind, they have opted to skip it and let individual developers build in support for the ram pooling across the bus.
In IRay, this may be possible with the NVLink Peer Group setting, given the correct GPU and drivers. I have not been anywhere near the 48 GB scene threshold as of late though.
Yes, Nvlink is dead. This could be good, or it could be bad. Indeed, using PCIe is what most believe the plan is, but not until gen 5. Being able to just use the PCIe would be awesome, but Nvidia only wanted certain cards to have Nvlink. So I would imagine that they would still try to limit users ability to pool VRAM in some way.
If Nvidia has not released any documentation for this, then Lovelace simply doesn't have it. It might come with the next GPU series. I can't imagine that Nvidia would have this feature and not tell anyone, nor would they add it at a later date post launch. If they were to add it later, they would probably have announced it to get buyers hyped for these cards.
I am still baffled by Nvidia's naming scheme. It is like they are intentionally trying to confuse people. Does the shroud have any reference to ADA at all?
CPU iterations may have gone up, but they are still weak for this task. I also wonder if the CPU doing more iterations is actually part of the problem that this new Iray has. Why is the CPU going faster while the GPU goes slower? Isn't that really odd? I believe they are connected. Does the CPU still render faster when it is the only rendering device?
Maybe we can have CPUs run the test to 180 iterations instead of 1800 so people don't have to wait so long, LOL.
I'm using a PCIe 3 x 16 motherboard and my 4090 render times are comparable with others posted in this thread.
I'm planning on an upgrade later in the year when new stuff has been put through the wringer and prices have settled.
You'd already posted this as a thread in the Commons, and been answered. Please don't double-post.
Believe it or not, Nvidia first introduced both hardware and driver-level support for pooling vram across multiple GPU's using the PCI-E bus all the way back in the Pascal days (ie. with the release of cards like the 1080ti.) Check out the following excerpt from this 2016 blog post:
Whenever a particular GPU touches data managed by Unified Memory, this data may migrate to local memory of the processor or the driver can establish a direct access over the available interconnect (PCIe or NVLINK).
Many applications can benefit from GPU memory oversubscription and the page migration capabilities of Pascal architecture. Data analytics and graph workloads usually run search queries on terabytes of sparse data. Predicting access patterns is extremely difficult in this scenario, especially if the queries are dynamic. Not only computational tasks but other domains like high-quality visualization can greatly benefit from Unified Memory. Imagine a ray tracing engine that shoots a ray which can bounce off in any direction depending on material surface. If the scene does not fit in GPU memory the ray may easily hit a surface that is not available and has to be fetched from CPU memory. In this case computing what pages should be migrated to GPU memory at what time is almost impossible without true GPU page fault capabilities.
It's just that Nvidia's implementation required certain types of sub operating-system-level access to GPU hardware on the software side that WDDM-based versions of windows (Vista and newer) made impossible. Making it effectively a Linux-only feature at the time. Which suited Nvidia fine, since the primary use-case for it was Nvidia's own DGX-1 line of ready-made Ubuntu-running rackmounted server class systems. Which feaured their own custom implementation of Iray at the time (notice that the Iray Server trial is - even to this day - a Windows or Linux comptaible download.)
Issues with the statement "part of the problem that this new Iray has" aside (since recent versions of Iray don't appear to have a particular problem functioning so far as I'm aware - it taking longer to complete iterations due to the complexity of iteration generation processes being increased in order to more realistically render diverse subject matter is a proper use-case for the saying "it's a feature, not a bug") the reason is almost certainly because of all the AI processing-oriented ASIC hardware the major CPU makers are putting into the modern chip designs, due to the current "AI" craze on the business/software development front. Most of those processing components offer substantial benefits to raytracing processes too, specifically.
Unfortunately any change to either iteration count or render dimensions (width x height) of any scene invalidates it for render performance comparison purposes, since changing any of those three values effectively randomizes the seed value used by Iray to determine which ray paths to trace and in what order. It's the equivalent to attempting to compare GPU performance by looking at the average FPS seen while playing a specific game during different sections of gameplay.
A lot of things have been technically introduced years ago, but that means little until a software supports it. There is no Iray documentation to this. There has been GPU-GPU communication in DirectX 12 as well, where you can use any two DX12 GPUs with each other. They can even mix AMD and Nvidia. But only one game has ever attempted to actually use this feature, Ashes of the Singularity.
I honestly don't believe that is driving the CPU iteration at all. The software has to explicitly take advantage of any new feature the hardware has. Just like Iray was not able to use the ray tracing cores in RTX without undergoing major changes first.
Also, the amount of work an iteration completes has remained remarkably constant ever since Iray's debut. You can just look at the renders and see the noise is basically the same. Keep in mind the recent posts about caustics involve enabling an optional feature, of course using different options is going to alter the iteration. Turning on caustics adds whole new calculation paths for Iray to work out, so yes, of course each iteration is taking longer and is doing a bit more work with each one. They literally change the parameters of the render, the very thing you say not to do.
But there is still noise. And when you use 4.16 with the default option, you can render a lot more iterations and get a much cleaner image in less time. I am not sure why this is a controversial statement. The images I posted above should make this pretty clear. Iray has improved its caustic sampler, that is great, really, it is, but the fact remains that Iray is still slower by any measurement compared to 4.15/16. Whether you pick iteration count, time, or convergence as your parameter, 4.16 is simply faster in every metric.
System/Motherboard: SuperMicro X12
CPU: 2x Xeon Gold 6348
GPU: A6000 + RTX6000 ADA Generation
System Memory: 512 GB DDR4 ECC @ 3200 MHz
OS Drive: SK hynix Platinum P41 2TB PCIe NVMe Gen4
Asset Drive: 256 GB RAM DRIVE
Operating System: Win 11 Pro
Nvidia Drivers Version: 528.24
PSU: Corsair AX1600i
RTX6000 ADA ONLY:
2023-01-28 18:58:51.714 Iray [INFO] - IRAY:RENDER :: 1.0 IRAY rend progr: Maximum number of samples reached.
2023-01-28 18:58:51.729 Iray [INFO] - IRAY:RENDER :: 1.0 IRAY rend info : Device statistics:
2023-01-28 18:58:51.729 Iray [INFO] - IRAY:RENDER :: 1.0 IRAY rend info : CUDA device 0 (NVIDIA RTX 6000 Ada Generation): 1800 iterations, 1.387s init, 77.239s render
2023-01-28 18:58:52.523 [INFO] :: Finished Rendering
2023-01-28 18:58:52.615 [INFO] :: Total Rendering Time: 1 minutes 21.57 seconds
Rendering Performance: (1800/77.239) = 23.30 Iterations Per Second
Loading Time: 4.331 Seconds
Just testing stability and power draw on RTX6000 ADA here. This was at pure stock clocks with no assist from the CPUs or 2nd GPU. It ran @ 77 seconds over 3 runs, each pass with very similar power draw curves (example below). GPU Core temps stayed under 40C on all tests with fan @ 85%.
Sharing the power usage as that has been a topic of interest of late. The RTX6000 ADA ran near max TDP on each pass. From some other passes, I can get similar speed with less power, say 95%; will need more testing to see where that starts to change to a noticable degree.
I wanted to see a head to head with only a CPU rendering. So I ran the benchmark with my 5800X in 4.16 and 4.21, but I capped the iterations to 180. So this is not comparable to other posts, only this one isolated post. Though I would point out the numbers do appear to line up with expectations when compared to other CPUs that have been benched the full 1800 iterations.
System Configuration
System/Motherboard: MSI MPG x570
CPU: AMD 5800x
GPU 0: EVGA 3060 Black @ stock
GPU 1: Nvidia Founder's 3090 @ stock
System Memory: GSkill 64GB (16x4) 3200Mhz
OS Drive: Inland M.2 2TB
Asset Drive: Samsung 870 EVO 4TB
Power Supply: EVGA 1000 GQ
Operating System: Windows 10 Pro 22H2 19045.2364
Nvidia Drivers Version: 527.56 SD
4.16.0.3 bench with CPU set to 180 iterations
2023-01-28 15:50:49.317 Total Rendering Time: 4 minutes 5.7 seconds
2023-01-28 15:51:21.852 Iray [INFO] - IRAY:RENDER :: 1.0 IRAY rend info : CPU: 180 iterations, 0.678s init, 243.238s render
0.740 iterations per second
4.21.1.26 bench with CPU set to 180 iterations
2023-01-28 16:05:01.586 [INFO] :: Total Rendering Time: 3 minutes 27.16 seconds
2023-01-28 16:05:46.017 Iray [INFO] - IRAY:RENDER :: 1.0 IRAY rend info : CPU: 180 iterations, 0.573s init, 205.537s render
0.876 iterations per second
What I really wanted to see was the difference for CPU only rendering. So the CPU rendering is faster, in fact about 18% faster in 4.21 than with 4.16. So any people who are stuck with CPU only rendering may actually see a nice gain with the newer Iray.
But that doesn't make CPU viable. While the gain here was a nice 18%, we are talking about 0.136 iterations per second gain here, and the CPU still couldn't muster above a single iteration per second. The 5800X may be one generation old now, but it is certainly not trash. The 3090 is an order of magnitude faster. To demonstrate just how wide that gap is, I ran the test with my 3090 in 4.21. Again, I want to point out this was capped at 180 iterations.
2023-01-28 16:22:02.742 [INFO] :: Total Rendering Time: 14.84 seconds
2023-01-28 16:22:19.479 Iray [INFO] - IRAY:RENDER :: 1.0 IRAY rend info : CUDA device 0 (NVIDIA GeForce RTX 3090): 180 iterations, 1.902s init, 11.900s render
That is 15.126 iterations per second. Skyshots has two Xeon Gold 6348s and they barely break the 2 iteration barrier. While that may be the full 1800 bench, I don't think the shorter test is going to produce a dramatically different result.
I also ran it one more time with 4.16 capped at 180 iterations to compare it to 4.21.
2023-01-28 16:28:21.046 Total Rendering Time: 13.31 seconds
2023-01-28 16:29:04.605 Iray [INFO] - IRAY:RENDER :: 1.0 IRAY rend info : CUDA device 0 (NVIDIA GeForce RTX 3090): 180 iterations, 1.452s init, 10.136s render
That comes to 17.758 iterations per second. Which comes out to 17.4% faster.
This still raises more questions. Even if we all say that Iray is doing more 'work' per iteration with 4.21, it is quite perplexing that GPUs iterate around 17.4% slower, and this rate is almost matched in reverse (18%) by CPUs. The fact that numbers are so close (just in reverse) makes this even stranger. After all, if the iteration is doing more (which I frankly do not believe) then it should stand to reason that all CPU rendering would face the same iteration cut that GPUs do. You see...CPUs could have gained performance and still have only matched their previous iteration rate. Understand what I am trying to say?
My scores were based on 56 threads only. If I could get both chips and all 112 threads involved in the render, probably closer to 4 iterations per second. Your finding is much like mine though: around 18-20% improvement in CPU performance with new beta versions of Daz. To validate your results, as RayDAnt mentioned, the full 1800 iteration benchmark would be helpful. Thank you for taking the time to check this though as it represents a major improvement in Daz’s CPU performance.
The graph below shows performance with the Xeon Gold relative to 4.15 as a baseline. These are derived as sub-component scores and done with GPU enabled, but food for thought.The CPU speed dropped going to 4.21.1.13 but then really picked up in the recent 4.21.1.26 beta.
To create true/valid results, I will need to run the full 1800 iterations with just the CPU in each version. Give me a minute..
To your other points, yes, something happened that slowed Daz down in terms of GPU performance. But that is the nature of things. Sometimes dev teams must work within Agile constraints. There may be confidentiality agreements/limited funding/long hours/mixed agendas/changing team members/poor oversight, etc. – take your pick here or all of the above. At a certain point between 4.16 and 4.20/4.21, an oversight was made OR a decision was made to incorporate changes into the development pipeline that would adversely affect every NVidia GPUs’ performance in IRAY. It may not be an easy fix, but they are certainly aware.100% to the IRAY dev team’s credit, they are looking at our data here (crowd sourcing if you will) and making improvements.
The point I wanted to make is that the argument that an iteration is doing more work kind of falls apart when CPUs are clearly not impacted the same way. Plus the renders demonstrate this as well.
It is apparent that the Iray dev team is focused heavily on caustics, and I can understand that. Calculating caustics is very hard. Enabling caustics in past versions of Iray killed render speeds, often doubling them or more. So having caustics improve is a good thing...but I don't believe it should come at the cost of general performance overall. While caustics are nice, most people don't take notice of this effect, and if you don't have the types of glass like materials that benefit from caustics, killing performance for caustics is a waste. The most visible impact from caustics is that the room has a bit more ambient light. This can be replicated with simple tone mapping, while rendering significantly faster (in 4.16).
My belief is the performance drop is down to a simple mistake. They already admitted that they made a mistake when they claimed to fix the performance loss. So I believe that they still have left something unoptimized, and this is wasting resources during rendering. If they fix this, it would benefit the caustics, too, because the base renderer would be optimized, not just the caustic sampler. So it is very much worth their effort to find the solution. This is a highly competitive industry, they have to be better than other render engines, period, or they will fail even if they have Nvidia's backing. Daz Studio is obviously not their only partner, though it is the most visible partner in every search for Iray.
When you start a CPU only render, it starts right away because the scene is already in RAM where the CPU accesses it. To render on GPU, it has to be sent to GPU, and this process involves several tasks, such how CUDA converts the textures into its own proprietary format, and compression. I am thinking something with this step is involved with the performance loss. It would explain why the CPUs appear to be unaffected.
I have tested compression. The default compression settings are very aggressive. I set them so that no texture at 8K or under would get compressed, all the textures were under 8K. The render will load faster, which is logical, but it didn't impact the render speed much. It did seem to render just a hair faster, but the margin was so small it could be within error, certainly not enough to explain the issue. So compression is not the problem. But we still have the translation Iray performs. I don't think we can test for that.
Allow me to be the first 'meh' reaction to the 4090. I got mine yesterday, and I've yet to see it impress me over my 3090. I'm not sure if the complex scenes I'm rendering are bottlenecked elswhere, but I thought doing the same benchmark 2 years later would have garnered more than 10 seconds difference. ¯\_(ツ)_/¯
Edit: Added iteration rate for 4090.
Your problem is simple, you are comparing the 4090 running Daz 4.21 to the 3090 running Daz 4.14. It isn't that your hardware is terrible, you made the mistake of "upgrading" to a signifinactly slower Iray that is packed with 4.20/21. If you compare the 3090 and the 4090 in the same version of Iray then the difference is larger. It isn't the doubling that I had hoped for, but still a solid gain.
If you kept 4.14 or made a backup of 4.15 or 4.16.0.3 then you can go back to those and the 4090 will be hitting much higher iteration counts across the board. In this scene and probably all of your own scenes. Any version of Daz after 4.16.0.3 has an updated Iray plugin that has been proven time and again to be be slower than 4.14-4.16 in my testing as well as others. To be clear, there are some versions of 4.16 that have the newer Iray, only 4.16.0.3 still had the older Iray. I think these were betas.
If you like using caustics, then 4.21 might not be so bad, and the VBD support for volumetrics came in 4.20. But if you don't care for these, then I'd go back if you can.