GPU's crash during Iray render

Here's my system:

i7 4790

16 gig ram

1200 w power

GTX 970

GTX 780- ti

Admittedly some of my renders are complex.  I have honed them into being small enough for my GPU RAM.  If I render at full power, it might render for 20 minutes and then go to CPU.  I am able to get it working by throttling both cards down to 75%, but obviously I would be like to be able to go full power.  I'm wondering if the issue is heat?  Techpowerup GPU-z tells me that after about 20 minutes, the 780 runs at 76 degrees, and the 970 at 73 degrees.  Any thoughts?

Comments

  • mjc1016mjc1016 Posts: 15,001

    What does the log file say when the cards quit?

    Help > Troubleshooting > View Log and scroll down to where they drop out...

  • areg5areg5 Posts: 617
    mjc1016 said:

    What does the log file say when the cards quit?

    Help > Troubleshooting > View Log and scroll down to where they drop out...

    The Daz log file?

  • areg5areg5 Posts: 617
    areg5 said:
    mjc1016 said:

    What does the log file say when the cards quit?

    Help > Troubleshooting > View Log and scroll down to where they drop out...

    2016-30-07 13:30:11.870 WARNING: dzneuraymgr.cpp(261): Iray ERROR - module:category(IRAY:RENDER):   1.2   IRAY   rend error: Kernel [18] failed after 0.000s
    2016-30-07 13:30:11.871 WARNING: dzneuraymgr.cpp(261): Iray ERROR - module:category(IRAY:RENDER):   1.7   IRAY   rend error: Kernel [9] failed after 0.000s
    2016-30-07 13:30:11.871 WARNING: dzneuraymgr.cpp(261): Iray ERROR - module:category(IRAY:RENDER):   1.2   IRAY   rend error: CUDA device 1 (GeForce GTX 780 Ti): an illegal memory access was encountered (while launching CUDA renderer in core_renderer_wf.cpp:821)
    2016-30-07 13:30:11.871 WARNING: dzneuraymgr.cpp(261): Iray ERROR - module:category(IRAY:RENDER):   1.7   IRAY   rend error: CUDA device 0 (GeForce GTX 970): an illegal memory access was encountered (while launching CUDA renderer in core_renderer_wf.cpp:821)
    2016-30-07 13:30:11.871 WARNING: dzneuraymgr.cpp(261): Iray ERROR - module:category(IRAY:RENDER):   1.2   IRAY   rend error: CUDA device 1 (GeForce GTX 780 Ti): Failed to launch renderer
    2016-30-07 13:30:11.871 WARNING: dzneuraymgr.cpp(261): Iray ERROR - module:category(IRAY:RENDER):   1.7   IRAY   rend error: CUDA device 0 (GeForce GTX 970): Failed to launch renderer
    2016-30-07 13:30:11.871 WARNING: dzneuraymgr.cpp(261): Iray ERROR - module:category(IRAY:RENDER):   1.3   IRAY   rend error: CUDA device 1 (GeForce GTX 780 Ti): an illegal memory access was encountered (while de-allocating memory)
    2016-30-07 13:30:11.871 WARNING: dzneuraymgr.cpp(261): Iray ERROR - module:category(IRAY:RENDER):   1.4   IRAY   rend error: CUDA device 0 (GeForce GTX 970): an illegal memory access was encountered (while de-allocating memory)
    2016-30-07 13:30:11.871 WARNING: dzneuraymgr.cpp(261): Iray ERROR - module:category(IRAY:RENDER):   1.3   IRAY   rend error: CUDA device 1 (GeForce GTX 780 Ti): an illegal memory access was encountered (while de-allocating memory)
    2016-30-07 13:30:11.871 WARNING: dzneuraymgr.cpp(261): Iray ERROR - module:category(IRAY:RENDER):   1.4   IRAY   rend error: CUDA device 0 (GeForce GTX 970): an illegal memory access was encountered (while de-allocating memory)
    2016-30-07 13:30:11.871 WARNING: dzneuraymgr.cpp(261): Iray ERROR - module:category(IRAY:RENDER):   1.3   IRAY   rend error: CUDA device 1 (GeForce GTX 780 Ti): an illegal memory access was encountered (while de-allocating memory)
    2016-30-07 13:30:11.871 WARNING: dzneuraymgr.cpp(261): Iray ERROR - module:category(IRAY:RENDER):   1.4   IRAY   rend error: CUDA device 0 (GeForce GTX 970): an illegal memory access was encountered (while de-allocating memory)
    2016-30-07 13:30:11.871 WARNING: dzneuraymgr.cpp(261): Iray ERROR - module:category(IRAY:RENDER):   1.3   IRAY   rend error: CUDA device 1 (GeForce GTX 780 Ti): an illegal memory access was encountered (while de-allocating memory)
    2016-30-07 13:30:11.872 WARNING: dzneuraymgr.cpp(261): Iray ERROR - module:category(IRAY:RENDER):   1.4   IRAY   rend error: CUDA device 0 (GeForce GTX 970): an illegal memory access was encountered (while de-allocating memory)
    2016-30-07 13:30:11.872 WARNING: dzneuraymgr.cpp(261): Iray ERROR - module:category(IRAY:RENDER):   1.3   IRAY   rend error: CUDA device 1 (GeForce GTX 780 Ti): an illegal memory access was encountered (while de-allocating memory)
    2016-30-07 13:30:11.872 WARNING: dzneuraymgr.cpp(261): Iray ERROR - module:category(IRAY:RENDER):   1.4   IRAY   rend error: CUDA device 0 (GeForce GTX 970): an illegal memory access was encountered (while de-allocating memory)
    2016-30-07 13:30:11.872 WARNING: dzneuraymgr.cpp(261): Iray ERROR - module:category(IRAY:RENDER):   1.3   IRAY   rend error: CUDA device 1 (GeForce GTX 780 Ti): Device failed while rendering
    2016-30-07 13:30:11.872 WARNING: dzneuraymgr.cpp(261): Iray ERROR - module:category(IRAY:RENDER):   1.4   IRAY   rend error: CUDA device 0 (GeForce GTX 970): Device failed while rendering
    2016-30-07 13:30:11.872 WARNING: dzneuraymgr.cpp(261): Iray ERROR - module:category(IRAY:RENDER):   1.3   IRAY   rend error: CUDA device 1 (GeForce GTX 780 Ti): an illegal memory access was encountered (while initializing memory buffer)
    2016-30-07 13:30:11.872 WARNING: dzneuraymgr.cpp(261): Iray ERROR - module:category(IRAY:RENDER):   1.4   IRAY   rend error: CUDA device 0 (GeForce GTX 970): an illegal memory access was encountered (while initializing memory buffer)
    2016-30-07 13:30:11.872 WARNING: dzneuraymgr.cpp(261): Iray ERROR - module:category(IRAY:RENDER):   1.3   IRAY   rend error: CUDA device 1 (GeForce GTX 780 Ti): an illegal memory access was encountered (while de-allocating memory)
    2016-30-07 13:30:11.872 WARNING: dzneuraymgr.cpp(261): Iray ERROR - module:category(IRAY:RENDER):   1.4   IRAY   rend error: All workers failed: aborting render
    2016-30-07 13:30:11.872 WARNING: dzneuraymgr.cpp(261): Iray ERROR - module:category(IRAY:RENDER):   1.3   IRAY   rend error: CUDA device 1 (GeForce GTX 780 Ti): an illegal memory access was encountered (while de-allocating memory)
    2016-30-07 13:30:11.872 WARNING: dzneuraymgr.cpp(261): Iray ERROR - module:category(IRAY:RENDER):   1.4   IRAY   rend error: CUDA device 0 (GeForce GTX 970): an illegal memory access was encountered (while de-allocating memory)
    2016-30-07 13:30:11.872 WARNING: dzneuraymgr.cpp(261): Iray ERROR - module:category(IRAY:RENDER):   1.3   IRAY   rend error: CUDA device 1 (GeForce GTX 780 Ti): an illegal memory access was encountered (while de-allocating memory)
    2016-30-07 13:30:11.872 WARNING: dzneuraymgr.cpp(261): Iray ERROR - module:category(IRAY:RENDER):   1.4   IRAY   rend error: CUDA device 0 (GeForce GTX 970): an illegal memory access was encountered (while de-allocating memory)
    2016-30-07 13:30:11.872 WARNING: dzneuraymgr.cpp(261): Iray ERROR - module:category(IRAY:RENDER):   1.3   IRAY   rend error: CUDA device 1 (GeForce GTX 780 Ti): an illegal memory access was encountered (while de-allocating memory)
    2016-30-07 13:30:11.873 WARNING: dzneuraymgr.cpp(261): Iray ERROR - module:category(IRAY:RENDER):   1.4   IRAY   rend error: CUDA device 0 (GeForce GTX 970): an illegal memory access was encountered (while de-allocating memory)
    2016-30-07 13:30:11.873 WARNING: dzneuraymgr.cpp(261): Iray ERROR - module:category(IRAY:RENDER):   1.3   IRAY   rend error: CUDA device 1 (GeForce GTX 780 Ti): an illegal memory access was encountered (while de-allocating memory)
    2016-30-07 13:30:11.873 WARNING: dzneuraymgr.cpp(261): Iray ERROR - module:category(IRAY:RENDER):   1.4   IRAY   rend error: CUDA device 0 (GeForce GTX 970): an illegal memory access was encountered (while de-allocating memory)
    2016-30-07 13:30:11.873 WARNING: dzneuraymgr.cpp(261): Iray ERROR - module:category(IRAY:RENDER):   1.3   IRAY   rend error: CUDA device 1 (GeForce GTX 780 Ti): an illegal memory access was encountered (while de-allocating memory)
    2016-30-07 13:30:11.873 WARNING: dzneuraymgr.cpp(261): Iray ERROR - module:category(IRAY:RENDER):   1.4   IRAY   rend error: CUDA device 0 (GeForce GTX 970): an illegal memory access was encountered (while de-allocating memory)
    2016-30-07 13:30:11.873 WARNING: dzneuraymgr.cpp(261): Iray ERROR - module:category(IRAY:RENDER):   1.3   IRAY   rend error: CUDA device 1 (GeForce GTX 780 Ti): an illegal memory access was encountered (while de-allocating memory)
    2016-30-07 13:30:11.873 WARNING: dzneuraymgr.cpp(261): Iray ERROR - module:category(IRAY:RENDER):   1.4   IRAY   rend error: CUDA device 0 (GeForce GTX 970): an illegal memory access was encountered (while de-allocating memory)
    2016-30-07 13:30:11.873 WARNING: dzneuraymgr.cpp(261): Iray ERROR - module:category(IRAY:RENDER):   1.3   IRAY   rend error: CUDA device 1 (GeForce GTX 780 Ti): an illegal memory access was encountered (while de-allocating memory)
    2016-30-07 13:30:11.873 WARNING: dzneuraymgr.cpp(261): Iray ERROR - module:category(IRAY:RENDER):   1.4   IRAY   rend error: CUDA device 0 (GeForce GTX 970): an illegal memory access was encountered (while de-allocating memory)
    2016-30-07 13:30:11.873 WARNING: dzneuraymgr.cpp(261): Iray ERROR - module:category(IRAY:RENDER):   1.3   IRAY   rend error: CUDA device 1 (GeForce GTX 780 Ti): an illegal memory access was encountered (while de-allocating memory)
    2016-30-07 13:30:11.873 WARNING: dzneuraymgr.cpp(261): Iray ERROR - module:category(IRAY:RENDER):   1.4   IRAY   rend error: CUDA device 0 (GeForce GTX 970): an illegal memory access was encountered (while de-allocating memory)
    2016-30-07 13:30:11.873 WARNING: dzneuraymgr.cpp(261): Iray ERROR - module:category(IRAY:RENDER):   1.3   IRAY   rend error: CUDA device 1 (GeForce GTX 780 Ti): an illegal memory access was encountered (while de-allocating memory)
    2016-30-07 13:30:11.873 WARNING: dzneuraymgr.cpp(261): Iray ERROR - module:category(IRAY:RENDER):   1.4   IRAY   rend error: CUDA device 0 (GeForce GTX 970): an illegal memory access was encountered (while de-allocating memory)
    2016-30-07 13:30:11.874 WARNING: dzneuraymgr.cpp(261): Iray ERROR - module:category(IRAY:RENDER):   1.4   IRAY   rend error: CUDA device 0 (GeForce GTX 970): an illegal memory access was encountered (while de-allocating memory)
    2016-30-07 13:30:11.874 WARNING: dzneuraymgr.cpp(261): Iray WARNING - module:category(IRAY:RENDER):   1.4   IRAY   rend warn : All available GPUs failed.
    2016-30-07 13:30:11.874 Iray INFO - module:category(IRAY:RENDER):   1.4   IRAY   rend info : Falling back to CPU rendering.

     

  • I have three 780 but they are not ti's. I do notice they get hot doing heavy Iray renders so I placed 2 fans on top of the three cards and the temps went way down. I don't think factory cooling for any product is made to keep a product viable. They need it to break so they can sell you another one.

  • I have three 780 but they are not ti's. I do notice they get hot doing heavy Iray renders so I placed 2 fans on top of the three cards and the temps went way down. I don't think factory cooling for any product is made to keep a product viable. They need it to break so they can sell you another one.

    Not quite, but not completely wrong either.  The GeForce series are designed for *GAMES* which push the GPU in spurts and not for prolonged rendering where the GPUs are at full blast for extended periods.  Thus the cooling is spec'd to remove heat in bursts.  Quadros are what nVidia designs for prolonged intensive GPU use.

    Kendall

  • namffuaknamffuak Posts: 4,176
    edited November 2016

    I have three 780 but they are not ti's. I do notice they get hot doing heavy Iray renders so I placed 2 fans on top of the three cards and the temps went way down. I don't think factory cooling for any product is made to keep a product viable. They need it to break so they can sell you another one.

    Not quite, but not completely wrong either.  The GeForce series are designed for *GAMES* which push the GPU in spurts and not for prolonged rendering where the GPUs are at full blast for extended periods.  Thus the cooling is spec'd to remove heat in bursts.  Quadros are what nVidia designs for prolonged intensive GPU use.

    Kendall

    I have a 980 TI and a 1080 - and I've found that they work best with a custom fan profile that starts ramping the fan speed up early and fast. If you rely on the default profile you'll end up with the card hitting the design maximum temperature and dropping off the performance as a result. Like Kendall says, they are configured for games that spike between maximum and near-minimum, not to run at 100% for hours.

    With the profile I use, the 980 TI hits about 77% fan speed and about 70 C maximum, with the gpu load at 97% and power at about 70%. I'm still playing with the 1080 to get the optimum.

    Post edited by namffuak on
  • mjc1016mjc1016 Posts: 15,001
    namffuak said:

     I'm still playing with the 1080 to get the optimum.

    And depending on the brand, that can be a bit of a problem as some have the fan speed way off...I believe EVGA is one that just issued an update that reworks the fan profile.

  • namffuaknamffuak Posts: 4,176
    mjc1016 said:
    namffuak said:

     I'm still playing with the 1080 to get the optimum.

    And depending on the brand, that can be a bit of a problem as some have the fan speed way off...I believe EVGA is one that just issued an update that reworks the fan profile.

    Both MSI - but the 1080 is a high-end gaming card (I went for the clock speed) and features their "twin frozr" fan design which keeps the fans off until the card heats up. Right now, with the fan speed set at 10% they are pulsing between zero and 820 rpm. I'm trying to find the threshold where the fans are on continuously without running at an extreme speed.

  • areg5areg5 Posts: 617

    So, if I understand what you are saying, the issue is heat most likely.  My cards are EVGAs, and I already did adjust the fan profile so starting at 50 degrees they go full blast.  Also set the power on both cards to 80%, and at those settings they're stable.  Does anyone have any experience with watercoolers?  I know they make them for cards.

  • namffuaknamffuak Posts: 4,176
    areg5 said:

    So, if I understand what you are saying, the issue is heat most likely.  My cards are EVGAs, and I already did adjust the fan profile so starting at 50 degrees they go full blast.  Also set the power on both cards to 80%, and at those settings they're stable.  Does anyone have any experience with watercoolers?  I know they make them for cards.

    Not really. The gpu is designed to handle thermal overload by cutting back the core clocks and memory clocks. So once my 980 TI hits 83 C the clocks start dropping in speed, the render slows down, and the temperature stays constant.

    Do you keep previous renders open? Your error messages look like a memory allocation error occurred about the time you started the render. The current Studio version keeps the memory allocation on the card for open render windows, and re-allocates memory if you start a new render.

  • areg5areg5 Posts: 617
    namffuak said:
    areg5 said:

    So, if I understand what you are saying, the issue is heat most likely.  My cards are EVGAs, and I already did adjust the fan profile so starting at 50 degrees they go full blast.  Also set the power on both cards to 80%, and at those settings they're stable.  Does anyone have any experience with watercoolers?  I know they make them for cards.

    Not really. The gpu is designed to handle thermal overload by cutting back the core clocks and memory clocks. So once my 980 TI hits 83 C the clocks start dropping in speed, the render slows down, and the temperature stays constant.

    Do you keep previous renders open? Your error messages look like a memory allocation error occurred about the time you started the render. The current Studio version keeps the memory allocation on the card for open render windows, and re-allocates memory if you start a new render.

    Keep them open?  I don't think so.  I do comics, so I render lots of scenes.  I render a scene, then open another one and render that one.  When you say "open render windows," do you mean keep the window it renders to open?  I use this batch rendering script, made for 3delight but it works very well for Iray, so it renders to file.  I have noticed my 780ti throttling the clock speed down.  Like I said, if I set both cards to 80% and keep the fans going when hot, it's stable.  It's just not typically stable at 100%.  Sometimes it works, sometimes it crashes.

     

Sign In or Register to comment.