DAZ Studio 4.21.0.5 new error behavior on GPU OOM

This is almost certainly a GPU OOM issue but I've never seen it before so I figure I'll document it here.  I have a simple scene (HDRI, one character) but I'm running the render at SubD 5 using a (dedictated) TitanXP with 12GByte of GPU (yeah, I know, buy a 4090...)  So this is right at the limit of the GPU RAM, in this case the OOM fallback pushed 2/3 of the frame buffer into CPU memory:

2022-10-19 09:51:45.916 Iray [INFO] - IRAY:RENDER ::   1.10  IRAY   rend info : CUDA device 0 (NVIDIA TITAN Xp): Scene processed in 4.908s
2022-10-19 09:51:45.916 [WARNING] :: ..\..\..\..\..\src\pluginsource\DzIrayRender\dzneuraymgr.cpp(369): Iray [WARNING] - IRAY:RENDER ::   1.10  IRAY   rend warn : CUDA device 0 (NVIDIA TITAN Xp): Failed to allocate 180.000 MiB for (device) frame buffer, will try allocating smaller (partial) frame buffer
2022-10-19 09:51:45.932 Iray [INFO] - IRAY:RENDER ::   1.10  IRAY   rend info : CUDA device 0 (NVIDIA TITAN Xp): Allocated 90.000 MiB for device frame buffer
2022-10-19 09:51:45.947 Iray [INFO] - IRAY:RENDER ::   1.10  IRAY   rend info : CUDA device 0 (NVIDIA TITAN Xp): Allocated 180.000 MiB for host-side frame buffer
2022-10-19 09:51:45.947 [WARNING] :: ..\..\..\..\..\src\pluginsource\DzIrayRender\dzneuraymgr.cpp(369): Iray [WARNING] - IRAY:RENDER ::   1.10  IRAY   rend warn : CUDA device 0 (NVIDIA TITAN Xp): Succeeded in allocating partial device frame buffer. Device efficiency will be affected.

This doesn't worry me, it's a 4K render and it happens quite frequently, despite the obsequious warning my experience is that the performance is hardly affected.  Everything went fine for a while:

2022-10-19 09:51:49.453 Iray [INFO] - IRAY:RENDER ::   1.10  IRAY   rend info : Allocating 1-layer frame buffer
2022-10-19 09:51:49.531 Iray [INFO] - IRAY:RENDER ::   1.0   IRAY   rend progr: Received update to 00001 iterations after 8.519s.
2022-10-19 09:51:52.786 Iray [INFO] - IRAY:RENDER ::   1.0   IRAY   rend progr: Received update to 00002 iterations after 11.778s.
2022-10-19 09:51:56.073 Iray [INFO] - IRAY:RENDER ::   1.0   IRAY   rend progr: Received update to 00003 iterations after 15.052s.
* * *
2022-10-19 14:38:47.369 Iray [INFO] - IRAY:RENDER ::   1.0   IRAY   rend progr: 99.24% of image converged
2022-10-19 14:38:47.389 Iray [INFO] - IRAY:RENDER ::   1.0   IRAY   rend progr: Received update to 04791 iterations after 17226.367s.

 

At which point this happened:

2022-10-19 14:41:59.678 Iray [INFO] - IRAY:RENDER ::   1.0   IRAY   rend progr: 99.30% of image converged
2022-10-19 14:41:59.716 Iray [INFO] - IRAY:RENDER ::   1.0   IRAY   rend progr: Received update to 04844 iterations after 17418.696s.
2022-10-19 14:42:35.365 Iray [INFO] - IRAY:RENDER ::   1.14  IRAY   rend info : CUDA device 0 (NVIDIA TITAN Xp): Prevent device timeout
2022-10-19 14:42:35.567 Iray [INFO] - IRAY:RENDER ::   1.14  IRAY   rend info : CUDA device 0 (NVIDIA TITAN Xp): Execute device timeout
2022-10-19 14:42:35.577 Iray [INFO] - IRAY:RENDER ::   1.4   IRAY   rend info : CUDA device 0 (NVIDIA TITAN Xp): Prevented device timeout
2022-10-19 14:42:35.577 Iray [INFO] - IRAY:RENDER ::   1.4   IRAY   rend info : Device timeout executed, resume 62 unfinished samples.
2022-10-19 14:42:37.796 Iray [INFO] - IRAY:RENDER ::   1.14  IRAY   rend info : CUDA device 0 (NVIDIA TITAN Xp): Prevent device timeout
2022-10-19 14:42:37.999 Iray [INFO] - IRAY:RENDER ::   1.14  IRAY   rend info : CUDA device 0 (NVIDIA TITAN Xp): Execute device timeout
2022-10-19 14:42:38.166 Iray [INFO] - IRAY:RENDER ::   1.4   IRAY   rend info : CUDA device 0 (NVIDIA TITAN Xp): Prevented device timeout
2022-10-19 14:42:38.166 Iray [INFO] - IRAY:RENDER ::   1.4   IRAY   rend info : Device timeout executed, resume 2055 unfinished samples.

So far as I can tell the render is now DITW:

2022-10-19 15:08:00.255 Iray [INFO] - IRAY:RENDER ::   1.4   IRAY   rend info : Device timeout executed, resume 1684 unfinished samples.
2022-10-19 15:08:01.986 Iray [INFO] - IRAY:RENDER ::   1.14  IRAY   rend info : CUDA device 0 (NVIDIA TITAN Xp): Prevent device timeout
2022-10-19 15:08:02.189 Iray [INFO] - IRAY:RENDER ::   1.14  IRAY   rend info : CUDA device 0 (NVIDIA TITAN Xp): Execute device timeout
2022-10-19 15:08:02.314 Iray [INFO] - IRAY:RENDER ::   1.4   IRAY   rend info : CUDA device 0 (NVIDIA TITAN Xp): Prevented device timeout

No advance from 99.3% converged and no more updates until:

2022-10-19 15:18:15.583 Iray [INFO] - IRAY:RENDER ::   1.0   IRAY   rend progr: 99.30% of image converged
2022-10-19 15:18:15.612 Iray [INFO] - IRAY:RENDER ::   1.0   IRAY   rend progr: Received update to 04898 iterations after 19594.593s.

That was the last message in the log file at the time though now regular (5 minute) updates have resumed.  (It's diffcult to track this in real time because Studio doesn't seem to flush the log regularly).

This is no big deal for me; I'm rendering at 100% convergence so that's only 58,061 pixels not converged if I understand what the "coverged" percentage means and it seems entirely possible that the problem corresponds to me opening another scene in Studio, which consumes between 100 and 200MByte of GPU RAM (just for starting Studio).  I use the TitanXP as a dedicated compute card; Studio, PhotoShop, PtGui but nothing else that I can find out how to disable.  While the render had, apparently, recovered from the problems (i.e. it was producing new iterations) convergence wasn't increasing fast enough for me so I just cancelled it; the missing pixels aren't immediately apparent ;-)

Sign In or Register to comment.