Constant crashes during render since upgrading DAZ Studio
Recently upgraded DAZ Studio to the latest version and also my NVIDIA drivers. Since then, it seems, I've been getting frequent crashes in the form of my display blanking or my system going to the "BSOD." This has not happened before, at least not with this frequency.
Here is some data I've collected. If anybody else is seeing similar errors, please respond. I do NOT believe this is related to hardware (and you may see why below). Any ideas what may be causing these failures?
SYSTEM DATA:
Dual NVIDIA RTX 2080 Ti cards
MSI Afterburner reports approximately 5800 VRAM usage (with 8192 possible max)
Max GPU temps are around 70-72C during render
Scene contains:
One Genesis 8 Female figure with eyelashes and textures
One hair figure (dforce Primavera Hair)
One strand-based hair prop
Environment map
-- No other special rendering options enabled (i.e. bloom, denoise, etc.) except default options.
Texture Compression settings: 512 / 2048
Allow CPU fallback disabled
DAZ Studio Version: 4.15.0.2
NVIDIA Driver Version: 462.31 (also tried reverting to 451.48)
LOG FILE ATTACHED (cannot paste here due to message size limits)
Comments
The "illegal memory access was encountered" does surface quite often, but so far I haven't seen a post that would explain the cause.
One possible culprit is that you have miscellaneous parts of different GPU drivers installed in your system.
This is not the same problem, but gives reason to look deeper at what's actually installed
https://www.daz3d.com/forums/discussion/comment/6635711/#Comment_6635711
I didn't find that post before, but I did find other posts about making sure remnants of older drivers are cleaned up. Referencing the post you linked, I did not find extraneous copies of the files mentioned (nvrtum64.dll or nvoptix.dll). Also, that post does not seems to be directly related to my issue as the OP discusses renders never starting up. Mine start, but then crash after some non-specific amount of time.
I have a gut feeling it's related somehow to DAZ either not releasing unused resources from VRAM, or it's trying to run some concurrent process during rendering to clean up resources. I have also ran some stress testing tools on my GPU and they have not produced any errors, but most of the free tools seems to be limited to 4 GB of VRAM for testing and that leaves me with an unknown.
Update: No success in determining the cause of this issue yet. However, rendering with each GPU independently does not seem to crash iray (at least, not yet). Will update again if there's any changes.
Assuming you have Windows 10, I would suggest the first step is to type "Reliability" in the search bar, and click on Reliability History. It will tell you whats been going on both hardware and software-wise on your computer for the past days and weeks.
Here is what I found in the Reliability Monitor:
Problem Event Name: BEX64
Application Name: DAZStudio.exe
Application Version: 4.15.0.2
Application Timestamp: 5ff62ba3
Fault Module Name: nvoglv64.dll
Fault Module Version: 27.21.14.6231
Fault Module Timestamp: 606f6d9d
Exception Offset: 0000000001185689
Exception Code: c0000409
Exception Data: 0000000000000007
OS Version: 10.0.19042.2.0.0.256.48
Locale ID: 1033
Additional Information 1: c54a
Additional Information 2: c54ac790a3035046c28e4b0030ea2062
Additional Information 3: 634f
Additional Information 4: 634fce0db46acba10a7bdbef658958dd
Problem Event Name: LiveKernelEvent
Code: 141
Parameter 1: ffff8302647d0460
Parameter 2: fffff80794c3dbe4
Parameter 3: 0
Parameter 4: 580
OS version: 10_0_19042
Service Pack: 0_0
Product: 256_1
OS Version: 10.0.19042.2.0.0.256.48
Locale ID: 1033
Just a long shot... are you working with the viewport set to 'Iray preview'?
I'm not sure what "reliability monitor" output you're showing, but what I'm referring to is "Reliability History".
nvoglv64.dll would be the nVidia GPU driver, the bit that handles OpenGL
That part I understand. But why is it now crashing so often? This file is installed by the NVIDIA driver package, not by DAZ, correct? I've tried both the latest version and the minimum version for DAZ 4.15 (per the pinned post on this forum). Am I to assume there's an issue with the driver? DAZ? Windows? It's very aggravating and causing hours of lost render time since renders are not saved as they progress.
Reliability Monitor is what opens up when I search for Reliability and click on "View reliability history" (from the Control Panel). The history is a chart showing back about a month, and clicking any (X) shows the data I posted above.
Not a long shot. I do this sometimes, but not all the time. Is there an issue with it? I've been using iray preview for years now without any problems. This crashing issue has creeped up recently. However, even after closing down DAZ, rebooting, and opening my scene again, it crashes when I go straight to full render without any preview/edits.
If you allow CPU fallback, it might allow Studio to write something to the log.
It doesn't always close DAZ down, sometimes it simply ends the render after a few minutes and it then writes to the log. That is one of the logs I included at the start of this thread.
Can u guys find a solution, I'm having the same problem as well
My apologies for never updating this thread. The result of my extensive testing discovered that one of my two GPUs had (or developed) a defect which didn't kick in until so much time had passed. My theory is that it was due to the poor cooling design on the RTX 2080 FE series. It was replaced under warranty and everything works now.
I suggest to andybody who uses more than one GPU to ensure to get the right type of fan configuration (on the GPU itself) for multi-GPU work. The 2080 FE is crap when it comes to cooling design since two or three cards may sit right next to each other and block the heat sink/fan intakes.
To prevent his from happening again, I had to construct a box with a powerful fan which directly blows on the two cards during rendering sessions and install MSI Afterburner (to modify the fan thresholds). This keeps the GPU temperatures below 75C as anything higher tends to cause the GPU to govern its speed or just operate at or over 80C constantly. Hope this helps.
P.S. I used Furmark to stress test my GPUs and discovered the naughty card.