GPU changes to CPU, while rendering in Iray

Hello,
There were some threads with the same problem, but I haven't found any solution to it. Maybe there is some new information or I missed something.
My problem:
After ~7 minutes of rendering in Iray, Daz uses the CPU instead of the GPU for rendering. With the GPU (GTX 780ti, 3GB VRam) I can render in 4-8 minutes, but the CPU makes everything a lot slower.
I can finish one picture that takes 5 minutes. If I render a second one without closing Daz, it changes ~ at 2-3 minutes.
2018-03-21 15:54:31.660 Iray INFO - module:category(IRAY:RENDER): 1.0 IRAY rend info : Received update to 00176 iterations after 162.344s.
2018-03-21 15:54:47.182 Iray INFO - module:category(IRAY:RENDER): 1.0 IRAY rend info : Received update to 00193 iterations after 177.867s.
2018-03-21 15:54:55.782 WARNING: dzneuraymgr.cpp(307): Iray ERROR - module:category(IRAY:RENDER): 1.6 IRAY rend error: CUDA device 0 (GeForce GTX 780 Ti): Kernel [18] failed after 0.039s
2018-03-21 15:54:55.783 WARNING: dzneuraymgr.cpp(307): Iray ERROR - module:category(IRAY:RENDER): 1.6 IRAY rend error: CUDA device 0 (GeForce GTX 780 Ti): an illegal memory access was encountered (while launching CUDA renderer in core_renderer_wf.cpp:807)
2018-03-21 15:54:55.783 WARNING: dzneuraymgr.cpp(307): Iray ERROR - module:category(IRAY:RENDER): 1.6 IRAY rend error: CUDA device 0 (GeForce GTX 780 Ti): Failed to launch renderer
2018-03-21 15:54:55.783 WARNING: dzneuraymgr.cpp(307): Iray ERROR - module:category(IRAY:RENDER): 1.2 IRAY rend error: CUDA device 0 (GeForce GTX 780 Ti): Device failed while rendering
2018-03-21 15:54:55.783 WARNING: dzneuraymgr.cpp(307): Iray WARNING - module:category(IRAY:RENDER): 1.2 IRAY rend warn : All available GPUs failed.
2018-03-21 15:54:55.784 Iray INFO - module:category(IRAY:RENDER): 1.2 IRAY rend info : Falling back to CPU rendering.
2018-03-21 15:54:55.784 WARNING: dzneuraymgr.cpp(307): Iray ERROR - module:category(IRAY:RENDER): 1.2 IRAY rend error: CUDA device 0 (GeForce GTX 780 Ti): an illegal memory access was encountered (while initializing memory buffer)
2018-03-21 15:54:55.784 WARNING: dzneuraymgr.cpp(307): Iray ERROR - module:category(IRAY:RENDER): 1.2 IRAY rend error: All workers failed: aborting render
2018-03-21 15:54:55.784 WARNING: dzneuraymgr.cpp(307): Iray ERROR - module:category(IRAY:RENDER): 1.2 IRAY rend error: CUDA device 0 (GeForce GTX 780 Ti): an illegal memory access was encountered (while de-allocating memory)
2018-03-21 15:54:55.784 WARNING: dzneuraymgr.cpp(307): Iray ERROR - module:category(IRAY:RENDER): 1.2 IRAY rend error: CUDA device 0 (GeForce GTX 780 Ti): an illegal memory access was encountered (while de-allocating memory)
2018-03-21 15:54:55.784 WARNING: dzneuraymgr.cpp(307): Iray ERROR - module:category(IRAY:RENDER): 1.2 IRAY rend error: CUDA device 0 (GeForce GTX 780 Ti): an illegal memory access was encountered (while de-allocating memory)
2018-03-21 15:54:55.784 WARNING: dzneuraymgr.cpp(307): Iray ERROR - module:category(IRAY:RENDER): 1.2 IRAY rend error: CUDA device 0 (GeForce GTX 780 Ti): an illegal memory access was encountered (while de-allocating memory)
2018-03-21 15:54:55.784 WARNING: dzneuraymgr.cpp(307): Iray ERROR - module:category(IRAY:RENDER): 1.2 IRAY rend error: CUDA device 0 (GeForce GTX 780 Ti): an illegal memory access was encountered (while de-allocating memory)
2018-03-21 15:54:55.784 WARNING: dzneuraymgr.cpp(307): Iray ERROR - module:category(IRAY:RENDER): 1.2 IRAY rend error: CUDA device 0 (GeForce GTX 780 Ti): an illegal memory access was encountered (while de-allocating memory)
2018-03-21 15:54:55.784 WARNING: dzneuraymgr.cpp(307): Iray ERROR - module:category(IRAY:RENDER): 1.2 IRAY rend error: CUDA device 0 (GeForce GTX 780 Ti): an illegal memory access was encountered (while de-allocating memory)
2018-03-21 15:54:55.784 WARNING: dzneuraymgr.cpp(307): Iray ERROR - module:category(IRAY:RENDER): 1.2 IRAY rend error: CUDA device 0 (GeForce GTX 780 Ti): an illegal memory access was encountered (while de-allocating memory)
2018-03-21 15:54:55.784 WARNING: dzneuraymgr.cpp(307): Iray ERROR - module:category(IRAY:RENDER): 1.2 IRAY rend error: CUDA device 0 (GeForce GTX 780 Ti): an illegal memory access was encountered (while de-allocating memory)
2018-03-21 15:54:55.784 WARNING: dzneuraymgr.cpp(307): Iray ERROR - module:category(IRAY:RENDER): 1.0 IRAY rend error: CUDA device 0 (GeForce GTX 780 Ti): an illegal memory access was encountered (while de-allocating memory)
2018-03-21 15:54:55.784 WARNING: dzneuraymgr.cpp(307): Iray ERROR - module:category(IRAY:RENDER): 1.0 IRAY rend error: CUDA device 0 (GeForce GTX 780 Ti): an illegal memory access was encountered (while de-allocating memory)
2018-03-21 15:54:55.785 WARNING: dzneuraymgr.cpp(307): Iray ERROR - module:category(IRAY:RENDER): 1.0 IRAY rend error: CUDA device 0 (GeForce GTX 780 Ti): an illegal memory access was encountered (while de-allocating memory)
2018-03-21 15:54:55.789 Iray INFO - module:category(IRAY:RENDER): 1.0 IRAY rend info : CPU: using 8 cores for rendering
2018-03-21 15:54:55.789 Iray INFO - module:category(IRAY:RENDER): 1.0 IRAY rend info : Rendering with 1 device(s):
2018-03-21 15:54:55.789 Iray INFO - module:category(IRAY:RENDER): 1.0 IRAY rend info : CPU
2018-03-21 15:54:55.789 Iray INFO - module:category(IRAY:RENDER): 1.0 IRAY rend info : Rendering...
2018-03-21 15:54:55.789 Iray INFO - module:category(IRAY:RENDER): 1.2 IRAY rend info : CPU: Scene processed in 0.000s
2018-03-21 15:54:55.789 Iray VERBOSE - module:category(IRAY:RENDER): 1.2 IRAY rend progr: CPU: Processing scene...
2018-03-21 15:54:55.791 Iray VERBOSE - module:category(IRAY:RENDER): 1.4 IRAY rend stat : Native CPU code generated in 9.06e-007s
So if I close Daz3d after rendering one picture, it seems to be ok - until 7-8 minutes of rendering. So I can't render pictures that takes +10 minutes...
Is there any solutions to it? People said in other threads, that the Vram is getting full. Should this really be normal?
I already set the "Render Target" to "Direct to file", so there is only the extra window with the progress while rendering. I changed the power options to "high performance".
Tried with OptiX Prime Acceleration on/off, no difference. The graphic temps are ok.
I hope someone can help me out with this.
Comments
In my case, it happens when the scene use too much of swap memory (a virtual RAM memory extension stored on the HDD).
I think the HDD is too slow for a GPU processing.
Once the GPU has loaded the scene, there shouldn't be any CPU RAM swapping in GPU Only mode.
@Baudraufauf
The GTX780TI doesn't have enough VRAM to render a scene with a lot of vertices/textures. Also if you leave the first render open (i.e. don't cancel or save it) the GPU will probably not have enough meme left for the next render
@Noah, I don't have an intern HDD, only one SSD which should be fast enough.
@fastbike1, So there is no way to get a "complex" scene with my GPU?
If I try to render more complex stuff: Iray lightning etc.
I take 300 iterations, it takes about ~10 minutes. But it doesn't look really good. Many yellow points, really corny. Does this mean that I have not enough iterations? If I use more, the GPU changes, wouldn't people with 6GB have the same problem after 20 or 30 mins? Isn't there a way to "clean" the Vram while rendering or something else?
I also have a GTX 660 Ti (2GB Vram) in another PC, would it change anything if I use it with the 780 ti?
@Baudraufauf "I take 300 iterations, it takes about ~10 minutes. But it doesn't look really good. Many yellow points, really corny. Does this mean that I have not enough iterations? If I use more, the GPU changes, wouldn't people with 6GB have the same problem after 20 or 30 mins? Isn't there a way to "clean" the Vram while rendering or something else?"
More iterations do not use more GPU Vram. The GPU Vram is consumed/used at the beginning where the scene is loaded on the Vram.
Depending on what you mean by complex, you won't be able to fit a complex scene on your card. A close-up with 4096 texture in the skin and hair may push you over the limit especially if you are using emissive lights.
A single large model (e.g. city, landscape, fortress) with a spme characters will also need more Vram, especially if a high resolution render. With a scene like this it doesn't change Vram usage if you are only rendering a small portion of the overall scene.
EDIT: I changed my Iray settings so that I only use 50% of the Vram, it still crashes after 60%... Very simple scene etc.
I looked at the Vram workload while rendering. The 720p and higher versions were always at 2,5 to 2,9GB (90 to 96%).
Now I started to do 576p pictures, now the workload is only at 80% max, but it still crashes.
Here from GPU-Z:
Before, while and after crashing:
Green is the GPU workload in %
Red is the used GPU memory (I have 3GB)
The memory controller is always between 50-60% while rendering.
1. GPU clock
2. GPU memory clock
3. GPU Temperature
4. Fan Speed
I should have 500mb left.
This also happens with only one character, without lights (only headlamp from camera), no background, with/without hair. Simple renders.
It depens, two days ago I could render a scene in 720p with 4000 iterations. A day later I loaded the same scene with no changes and there it changed again to the CPU (at ~2000 iterations, not sure).
I am always closing daz after rendering, so the memory is always "free" at the beginning.
... nvm ...
I also observed the RAM workload, I have 16GB, Daz uses 6-7GB. Without Chrome the PC works with 10GB. 6GB left.
I also used the Scene Optimicer, now Daz uses less graphic memory, but still the same... I don't get it...
My scenes aren't very complex, tried every constellation. There might be something off with the settings. Any idea which setting could influence that? I am not sure if a new graphic card would solve the problem.
I don't want to waste 400$ on a new card and then it's still not working. I know that the 780 ti is old, but at least a simple Iray render should work...
You should be able to do a single character, w/ hair and clothing as a studio style shot. I did many shots like that when I had a 780TI.
If you used to be able to render with the 780TI, yet current render fail to CPU or crash the computer, it may be a sign of impending GPU failure. Depending on your hsitoric usage, the 780TI is definitely in the window where that generation of cards begin ti have higher failure rates.
Baudraufauf are you also using the 780ti for your display? If so, Windows is holding some memory for the display. Use your other card for the display and use the 780ti just for rendering. I use just the 780ti for rendering and it does well for small scenes. You may have to optimize your scene's texture sizes.
As Fastbike said early on, once your GPU has loaded your scene into its memory buffer and rendering begins, there should not be an issue of insufficient VRAM that causes defaulting to your CPU and system RAM. Once the scene has loaded, it should be good to go. One thing that I do find troubling is the data that you posted from GPU-Z. Bearing in mind that the information that such utilities give is read from the diode on the video card as interpreted and reported by the video card's firmware, and it is not uncommon for that to be less than perfectly accurate, what I see is this:
At 22:22:46 your card is rendering without errors, but your GPU temperatures have reached 80 degrees (what they were immediately prior to this is anybody's guess). That is pretty high, especially since your fan speed at the same time is only showing it/them spinning at 63%.
At 22:23:43, nearly a minute later, things have already started to go south, with the first error being reported at 22:23:15. This results in a sudden drop in GPU usage as the video card drops out and defaults to the CPU. Both fan speed and GPU temperature decline slowly thereafter since the video card is no longer being used.
Taken together, the two are not definitive, but it would be really interesting to see what the GPU-Z outputs were from the time when the render actually began at approximately 22:10:25. If the video card was throttling prior to the first reading you show at at 22:22:46, the temperatures may have been even higher than shown. As I say, in the absence of that data it is difficult to say for certain, but I suspect that the problem may be overheating of your video card, which in turn may be due to an inadequate fan profile setting. I would suggest that you use a utility such as MSI Afterburner or similar to set up a fan profile for your card that ramps your fan speed up to 100% for anything above 70 degrees for starters to see what effect that has. It won't hurt anything to do so, other than having a bit more fan noise, depending on your cooler. If it still gets too hot or bails on you, have it ramp the fan speed to 100% at 60 degrees - the idea is to get ahead of rising temperatures before they become an issue. Again, I cannot say that this is the problem or adjusting your fan profile will solve anything, but it won't hurt to try.
(BTW, what make and model of card is that?)
Basic answer, the whole scene is loaded into the the RAM on your graphics card, if it excedes this it will dump out to the CPU and your system RAM which will be slower. Use some GPU monitoring software to show graphics card RAM usage and you will see it going up before it pops, I've had it happen on my 1080ti which has 11GB of RAM.
The biggest RAM gobbler are the scene textures although geometry can be just as bad sometimes. There are some tools available, I think one is called iray optimiser, that can reduce texture sizes where detail is not required so much for example items set in the distance.
Sorry, but I still contend that the problem is with the graphics card overheating, compounded by the fact that the user has overclocked the card to begin with.
I agree with @Kevin Sanderson. In my experience it is much better to have a GPU entirely dedicated to Iray, and a separated GPU for the viewport. If you only have one card then you can use the processor integrated GPU for the viewport (intel hd). This helps both the Iray reliability and the viewport response time.
The other things you did, disabling optix and rendering to file, are fine too to improve reliability. Of course you also have to avoid GPU overheating if this happens.
Hope this helps.
So, first thanks to everyone. The temperature was the rascal.
I didn't overclock the GPU, it's an EVGA 780ti Superclocked I think. The fan curve was made by me. I thought that 80° is ok, it seems like it's not. Never had any troubles with games.
I increased the fan speed, only one crash since yesterday and that was at 4500 iterations, more than ever. The card is now at max 73°, but now it's way too load. The GPU cooler will be changed, I have a Raijintek Morpheus I laying around (I'll also get a 650 Ti for the monitor).
Every other scene rendered succesfully. Reducing the scenes with the "Scene Optimizer" also helps a lot. 50-80% Vram Usage and it renders a lot faster with minimal quality change.
Again, thanks everyone for helping me out :)
@Baudraufauf
80C is the design operating temperature of the 780Ti. It should be fine to operate there for extended periods. It the fan curve can't keep the temperature at or below 80 then the card should start reducing clock speed until the temperature returns to limits.
Sadly I suspect card failure is in your "not distant" future.