Nvidia driver - vram management change - headsup
Just a note about what I read about Nvidia gforce drivers (I am not sure about studio drivers).
Nvidia changed the way how low vram situations are handled. They write, it was implemented in version 536.40.
In a stable diffusion forum it is noted, that the the latest driver not affected by the change is version: 531.79.
When getting close to an out of vram situation, the driver decides to move data from vram to system ram - based on the internal threshold this can happen even though the vram is not fully used.
The issue is that system ram is way slower than vram or, depending on the software used, it might not be stable at all.
This is a big issue when using stable diffusion A.I. - it causes super slow image generation - in my case it also causes a big memory leak with a total stall of the stable diffusion engine.
Maybe this is also affecting DS in some way or another.
Now there is a new driver version that allows to disable the system ram fallback. It is version: 546.01
Link to how to do this is below (it is for the stable diffusion python runtime - but the concept should work for any software).
https://nvidia.custhelp.com/app/answers/detail/a_id/5490
Again, just a headsup.
Comments
Got it...thanks! I haven't seen that in Studio Driver yet...
I think it has to be enabled per app using custom profiles. Not sure if studio is supported, might be an option for some but it's not a can of worms I'm going to open.
Hmm... This could explain some of the problems people have been having...
I checked some related info.. https://medium.com/@usamakenway/increase-cuda-memory-with-sysmem-fallback-policy-bb0011b3d7ec ... mostly for training purpose.
It's in Global Settings and with Game Driver so far. Because I use Stable Diffusion as well, I have not encountered the above issue up to now with Studio Driver.
This feature is not good for rendering...
Doubt that it will affect Iray rendering. AFAIK, data is loaded once per render, and not continually. You can either fit the image into VRAM or you can't. There hasn't been any mention of this on the Iray blog, either.
Other render engines that use system as well as video RAM might do things differently.
No, it does not affect iray rendering by now. Even if it's technically achievable, it must be supportive in the application with SDK development. I just meant that fallback to RAM technics is not good for rendering.
Yeah, it basically does what iray does when dropping to CPU to render when even getting close to VRAM limits in other applications. So training stable diffusion model goes to hell. A model that should take minutes or hours to complete goes to days and weeks lol.
The latest studio driver has the global setting and per app setting so I tried it with a DS render in driver default and prefer sysmem fallback mode.
5 random HD figures with different bases and compression set to 4k/8k. Both renders used all 24gb vram and 41gb sys ram.
Which version number of Studio Driver that has System Fall Back option, 537.70 ? I'm using 537.58 ~~ with RTX A6000.
So total VRAM+RAM consumption was 65GB, just with 5 Genesis 8 or 9 figures...? How many SubD Render Level did you set?
Did it prove that you did not really ran out of your VRAM, i.e. DS render didn't fallback to CPU at all, but GPU just pushed 41GB to RAM ? Any performance was down there?
@crosswind As I understand it, 546.01 allows to enable or disable the cuda system memory in the driver settings per application. That means iray should not be aware of the difference because memory mapping happens at the driver level, with no way for the application to know it. Where did you read that it requires to be supported by the application code, can you provide some reference please ?
Not that I don't trust you, rather I'm interested in understanding more on this. Thank you.
p.s. In the few docs referenced it seems cuda can now use the shared memory that wasn't possible before. Probably the driver swaps between system and gpu memory since cuda cores can't reach the system memory by hardware limitation.
It is not clear to me if this is limited to the cuda working buffer and tensors, or if it extends to textures and geometry. I'm not familiar with AI training but I understand stable diffusion uses about 6G working buffer while iray uses about 2G working buffer, so if this is limited to the working buffer then the advantage for iray would be minimal.
https://nvidia.custhelp.com/app/answers/detail/a_id/5490/~/system-memory-fallback-for-stable-diffusion
https://medium.com/@usamakenway/increase-cuda-memory-with-sysmem-fallback-policy-bb0011b3d7ec
p.p.s. Unfortunately I'm not in a good position to test this, because my system has limited memory. I mean all my system memory is already used by daz studio to pass the data to iray when I load a medium complex scene, so even if the drivers could swap memory there wouldn't be any free available and it'd go to disk swapping that's not an option.
Eventually if we understand this works for iray then I can easily update my system with more ram and this is essentially why I'm asking here for others who can test. As I understand it you need a system where ram = 4x vram or more.
546.01
Whatever level they loaded with. G8f clothed, G8 drago, horse 3, dog with fur, G9 creature and a set.
CPU fallback is off, always is.
It didn't run out of vram, it just used as much as it could according to the texture compression settings. Those are normal numbers for my system. Performance was normal.
Enabling it made no difference for one render on my PC but it's hard to run it out of resources even when trying. A better test would be with a GPU that often falls back to CPU.
Oh... a NFB version, I saw nothing there in its release notes document. Normally I won't touch NFB but just WHQL. Sorry, I did not see it anywhere but just thought as usual. e.g. when NVLnk was enabled, there needed a NVlink Peer Group support from DS. If there's a new mechanics handed by driver only this time, then I was wrong. Don't be misled by me.
Anyway, I'm hesitating to the update and test...
Thanks for the info. I just wanna have some test though I'm hesitating. I wonder if the driver, by design, does or does not make VRAM exhaust before pushing comsumption to RAM...
I'm wonder about that too. I just changed 546.01 to "Prefer No System Fallback." I'll have to test apps now. PS has been partially freezing when navigating with the space bar. Could be a Wacom driver issue. Looking forward to seeing how it performs after a reboot. I stopped a render after over 17 hours of rendering the other day. Kept loading lens data. Ended up deleting some out of camera props & changing some characters mesh res to base. Rendered 4k in about 30 minutes. Nvidia gave me a Quadro with 50 TB of vram. It was awesome. Sadly I woke up. Was nice while it lasted.
I saw this driver release, but I thought it still needed software support so I didn't look into it. There is only one way to test this out, make a scene you know will not fit your VRAM and hit the render button.
As I happen to own a 3060 and a 3090, I have multiple scenes I have made which exceed my 3060's VRAM, so I am in a unique position to test this out.
Looking into the help log, with this enabled you will find this line:
IRAY:RENDER :: 1.3 IRAY rend info : CUDA device 0 (NVIDIA GeForce RTX 3090): Optimizing for cooperative usage (performance could be sacrificed)
This got me excited. Alright let's do this!
Sadly the excitement didn't last. I was unable to get the 3060 to render a scene too large for it. It isn't like the scene is super huge, either. All I did was hide/unhide a character, that was enough to break the the VRAM limit on the 3060 in this scene, so we only talking maybe a gig at most over its limit.
I tried multiple things. I tried rendering with both my 3090 and 3060, no dice. I kind of figured it wouldn't work that way, but I hoped.
I tried the 3060 alone, then I tried having the 3060 and CPU both checked as rendering devices thinking that was needed. I restarted DS multiple times, and tried with the setting enabled and disabled to see if there was any difference. I even tried swapping the GPU that ran the display to see if that helped.
The good news is that it doesn't seem to affect rendering speed in my testing. In fact my fastest 3090 times were with the fallback enabled. It might alter the time it takes to load the scene into VRAM, but just by a second or two. I also didn't see a difference in VRAM. Again, I cycled through hiding and unhiding a character, and this had the same behavior whether fallback was enabled or not (and with restarting DS). If the character was hidden, the 3060 worked, if it was visible the 3060 dropped out. So if there is any window for VRAM use, it might be relatively small. The character had less than a gig of impact on the 3090. Without this character, the 3060 was using 11.25 GB when it rendered.
One thing I noticed, is that the help log note I described above only showed up for the 3090, indeed notice it only listed the 3090 on that line. I could not get the notice to show up for the 3060, so this is likely a bug. I do not believe this is a 3090 thing, because the dev notes clearly mention using a 6GB GPU for Stable Diffusion, which is obviously a lesser tier product.
Maybe it is my setup with 2 GPUs. But we also know that this driver update is really just targeting Stable Diffusion. After all, it is even called that on the web page linked, System Fallback for Stable Diffusion. Perhaps they want to work on other CUDA apps but their main target is SD. Perhaps in the future a driver update will get it working properly. But for now, it appears to have no use for Daz Studio Iray.
Thanks for the test and feedback ! Then I'll just let go...
IRAY:RENDER :: 1.3 IRAY rend info : CUDA device 0 (NVIDIA GeForce RTX 3090): Optimizing for cooperative usage (performance could be sacrificed)
is not new. I think it just means you are using the same graphics card to drive your monitor and for Iray rendering.
@outrider42
Thank you for testing this. My system is not good because I have limited ram and daz studio takes it all so there's little to fall back to. It may be that the cuda system memory fallback is limited to the working buffer as I noted above, this way it is of little use for iray since textures and geometry are what matters most. But it's good for stable diffusion and AI training in general.
One more test you could do is to take out your 3090 and leave only the 3060 plugged in. But I'm not asking this it's up to you if you want to go deep. Thank you again.
Yes, I beleive that is correct.
I might, but perhaps another time. I believe Stable Diffusion only works with one GPU, so this driver may only work right with one GPU installed. However I did not have any issues running SD locally, it simply lets me pick which GPU to use. I have not tested this driver with SD as I uninstalled SD a while back.
Some behavior that I noticed when rendering out of VRAM is that the 3060 will attempt to start up every 10 iterations. The clock speed will boost up for a moment, but it instantly drops back down and never actually renders a frame. I don't think this is new, but I do not recall DS showing this information in the progress window like it does now. However that could just be a difference with this version of DS (4.21). If I run out of VRAM on the 3060 I will stop the render and uncheck to switch it off as a rendering device, or I disable before starting because I know it will be out of VRAM. I have found that it renders faster this way than to leave the render going while the 3060 is still enabled as a render device.
I have to say it is funny how Nvidia finally do something with memory in 2023. It really just shows how huge AI is for them. Think about it, GPU endering has been around for a very long time now, but it is only now in the middle of AI's peak popularity that Nvidia ships a driver to address out of VRAM issues. Given this driver feature is focused on Stable Diffusion, I don't know if they will try to make it work for renderers like Iray. Of course we can hope. Perhaps a question for the Iray Dev Team?