Iray Rendering - is SLI finally broken? (+some render artifacts)
Sorry for the attention-grabbing title. However - either i'm doing completely wrong or there is some major stuff broken with the new Iray in Studio 4.20.0.2 release. Until the previous version, SLI did work as far as i can tell, even with memory pooling.
I am using 2 x RTX 2080 Ti with the latest Nvidia Studio Driver Version 511.65
- Got the new release notification, so i upgraded via DIM
- Started a render, finished the render, started a render again -> DAZ Studio switches to CPU rendering
So i started to dig into the log files and found a plethora of error messages, which can be summarized by a central statement:
[WARNING] :: ..\..\..\..\..\src\pluginsource\DzIrayRender\dzneuraymgr.cpp(367): Iray [ERROR] - IRAY:RENDER :: 1.27 IRAY rend error: CUDA device 0 (NVIDIA GeForce RTX 2080 Ti): an illegal memory access was encountered (while launching CUDA renderer in <internal>:937)
Using Iray rendering without SLI doesn't show this error message - several subsequent renders will stay on the GPU(s).
---
There are several other Iray related error messages which you can see in the attached log files. The reason i have omitted them in my above explanation is, that they seem to have no or a minor impact.
However the is one thing that startles me - it can be experienced when using "Matte Fog" and "Atmospheric Ground Fog": There is a difference in applying the fog in different areas of the render. Also there are some ghosts in the image. Please see the attached images below.
So my questions are.
- Is anybody out there with a SLI setup who is experiencing these memory error messages?
- Are the render artifacts caused by my setup or is this a common thing which can be seen by other users?
Thanks a lot to anybody who is willing to have a look into my problem!
Comments
The offocial position has always been that SLI is not supported and should eb turned off. The last few versions of Iray have suported nVlink, but that requires at least a 3090 as far as I know.
2080TI is using nvlink and before 4.20 memory poolling via nvlink worked perfectly fine.
I had the same error, fixed it by disabling memory pooling inside DAZ Studio.
But this is just a workaround, because of the loss of memory pooling.
Before continuing with the outcome of my "research" i want to say a big thanks to the developers of DAZ Studio who take the effort to integrate Iray as a renderer.
Also thanks to gerster who pointed out that disabling memory pooling is enough to cinrcumvent the errounous behavior.
Still i am not convinced that a nonworking SLI setup of 2080/3090 cards (via nvbridge of course) should be disposed as nonworking.
AFAIK Iray relies on it's own unified memory model and my system satisfies the prerequesites for P2P:
As soon as i start the render of a large scene, i can see a lot of data exchange via the bridge:
As mentioned in my first post, there are 2 problems which actually CAN be caused by errors/oversights in the implementation:
- a visually flawed render
- a memory leak - maybe a forgotten cleanup after the render
If there would be some official information about the state of sli - besides oral tradition - one could at least figure out if Studio users should complain at Nvidia, wait for a programming error to be fixed or sell the second gpu at a higher price that bought 2 years ago
Have you tried an older driver?
"mi_plugin_factory" not found - The DLLs are from the latest DAZ Studio release. Maybe a symbol leftover from development?
the parametric approximation level is set to 4. The original value of 5 would produce too much geometry in a single mesh. - It's a pre-made scene: "Enclosed Side Garden". DAZ Studio complains but continues anyway. Render behavior doesn't change...
Have you tried an older driver? - 472.47 which i had at hand, did render correct alphas but did not use the bridge with above test scene. So i forced memory pooling with a larger scene - and as soon as it kicked in, it failed with:
You are not alone. Been using 2 x 2080ti with nvlink for years in Daz and in more than 3 of the last beta of daz (and now the release version), with plenty of different nvidia drivers, same result as you. First render is fine, all other render failed, until I restart the app.
It's the Iray driver or the daz application that does'nt free the resources of the cards after the first render. Now that the Iray driver does'nt verbose the memory used for textures on the cards anymore, we cannot test that memory sharing work anymore but since closing Daz and reopening it work (even thou it's not an instant operation, we know it's not the cards themselves that are left in an invalid state, it's either the app or the iray "plugin" that are left in an invalid "state".
Been using nvlink (only for daz) since may 2020 when I got my second 2080ti (as shown here : https://www.daz3d.com/forums/discussion/comment/5628066/#Comment_5628066 )
In sept 2020, it was said that only the 3090 would support nvlink for the 3000 generation. Maybe it's the "Iray pluggin" modified to help sell more 3090 or future rtx4000 but it's sad to lose that feature and some performance (51.92 seconds to 56.78 seconds for an easy scene).
The nvlink for 2080ti never got any use in games so I only bought it for Daz and now I have to not use it ? I would like a menu option that keep the scene, everything that is loaded and ready in memory BUT drop the pluggin and reset it as if I would be starting daz then load back stuff on the cards, so I could launch another render.... if that is possible.
What is sure is that the "NVLink Peer Group Size : 2" that had work from before may 2020 on 2 x 2080ti no longer work since the last beta branch. (I've seen texture problem, changing render setting after the first render to try to reset it simply crash Daz Studio.
The complexity of the scene is no issue...
this is tonight with a single Genesis 8.0 caracter, not a single change to properties or rotation between the 2 renders.
2022-02-21 18:28:14.959 [WARNING] :: ..\..\..\..\..\src\pluginsource\DzIrayRender\dzneuraymgr.cpp(367): Iray [ERROR] - IRAY:RENDER :: 1.7 IRAY rend error: CUDA device 1 (NVIDIA GeForce RTX 2080 Ti): an illegal memory access was encountered (while launching CUDA renderer in <internal>:937)
2022-02-21 18:28:14.959 [WARNING] :: ..\..\..\..\..\src\pluginsource\DzIrayRender\dzneuraymgr.cpp(367): Iray [ERROR] - IRAY:RENDER :: 1.7 IRAY rend error: CUDA device 1 (NVIDIA GeForce RTX 2080 Ti): Failed to launch renderer
2022-02-21 18:28:14.959 [WARNING] :: ..\..\..\..\..\src\pluginsource\DzIrayRender\dzneuraymgr.cpp(367): Iray [ERROR] - IRAY:RENDER :: 1.8 IRAY rend error: CUDA device 0 (NVIDIA GeForce RTX 2080 Ti): an illegal memory access was encountered (while copying device buffer)
2022-02-21 18:28:14.960 [WARNING] :: ..\..\..\..\..\src\pluginsource\DzIrayRender\dzneuraymgr.cpp(367): Iray [ERROR] - IRAY:RENDER :: 1.11 IRAY rend error: CUDA device 1 (NVIDIA GeForce RTX 2080 Ti): Device failed while rendering
2022-02-21 18:28:14.960 [WARNING] :: ..\..\..\..\..\src\pluginsource\DzIrayRender\dzneuraymgr.cpp(367): Iray [WARNING] - IRAY:RENDER :: 1.11 IRAY rend warn : CUDA device 1 (NVIDIA GeForce RTX 2080 Ti) is no longer available for rendering.
2022-02-21 18:28:14.960 [WARNING] :: ..\..\..\..\..\src\pluginsource\DzIrayRender\dzneuraymgr.cpp(367): Iray [ERROR] - IRAY:RENDER :: 1.2 IRAY rend error: CUDA device 0 (NVIDIA GeForce RTX 2080 Ti): Failed to merge device frame buffer into master buffer
2022-02-21 18:28:14.960 [WARNING] :: ..\..\..\..\..\src\pluginsource\DzIrayRender\dzneuraymgr.cpp(367): Iray [ERROR] - IRAY:RENDER :: 1.2 IRAY rend error: CUDA device 0 (NVIDIA GeForce RTX 2080 Ti): Device failed while rendering
2022-02-21 18:28:14.960 [WARNING] :: ..\..\..\..\..\src\pluginsource\DzIrayRender\dzneuraymgr.cpp(367): Iray [WARNING] - IRAY:RENDER :: 1.2 IRAY rend warn : CUDA device 0 (NVIDIA GeForce RTX 2080 Ti) is no longer available for rendering.
2022-02-21 18:28:14.960 [WARNING] :: ..\..\..\..\..\src\pluginsource\DzIrayRender\dzneuraymgr.cpp(367): Iray [ERROR] - IRAY:RENDER :: 1.11 IRAY rend error: CUDA device 1 (NVIDIA GeForce RTX 2080 Ti): an illegal memory access was encountered (while initializing memory buffer)
After that, I close Daz, start it back, put a single Genesis 8.0 caracter, render very fast. Start another render, it fail again.
Also of interest : I've seen thread of someone having trouble rendering because it had an iray viewport (which I don't use), so I guess the viewport was the first render, thus worked, and the second one (the real render from the Render button, got the same Device failed while rendering or memory access I get on my second run...
Makes no difference, where they are from, that is an Nvidia/Iray problem and apparently not only affecting DS (Google the phrase)
Premade scenes can also have settings that make no sense. SubD 5 is just a waste of resources in a scene like that, and depending on what it is that has SubD 5, it may eat a substancial amount of your VRAM on both cards as Geometry is not pooled between the cards.
The error says, there is an error in the code, which means something is not working like it should. Usually there is a comma or bracket missing on some line of the code, rendering that part of the code non-understandable for DS and the computer.
Did a restore of Studio 4.16 to compare side by side with the latest Nvidia driver. Both versions use memory pooling in the above scene (at least i see data crossing the GPU bridge).
4.16: No visual artifacts, render time 84.3 sec
4.20: Visual artifacts, memory leak, render time 113.8 sec
For once, this time it doesn't seem to be Nvidias fault.
Iray is still Nvidia
It's not only the drivers, but also the version of Iray in DS
You are right, Subd 5 is certainly a waste of resources, however - it doesn't play a role besides the fact that it helps detecting the problem as the scene alone is enough to force memory pooling. See a log snippet below from Studio 4.16 where memory usage is printed in the log.
I also found this in the logs from Version 4.16. So, maybe some preloading stuff i messed up. The parser states that it happens at line 1 position 1 and the loaded scene has a "{'" at this position. Again, nothing that contributes to the problem as i found out with 4.16.
Relevant log part when rendering with V4.16:
Ok, that isn't much
Ack, i think they are sometimes not very willing to solve technical challenges
In order to utilize memory pooling multiple GPU's should be run in in TCC mode. WDDM will not allow for this reliably. You need the NVIDIA SMI utility to set this up. The problem is in Windows and WDDM. Bear in mind TCC mode will not let you use either card to drive a monitor. So you need a third card to run the monitor (set to WDDM). The 2080ti can be run in TCC mode, while it is a bit sketchy. Titan RTX and Turing based Quadro's also support TCC.
I run two Titan RTX in TCC allowing scene sizes up to 48Gb with no issues in one of my workstations. I use an old Titan Xp to drive the monitor. In the other workstation (which is identical) I run a single 3090.
I did a clean install of DAZ 4.20.0.2 completely removing it (including instances in the registry) and installing clean. I had heard the upgrade path through DIM has caused issues. I have no issues on either workstation.
As far as I am aware the bVlink method allows memory pooling (for materials) regardless of driver. But of course that is avaialble for only a very small number of recent, high-end cards.