GPU's suddenly failing while rendering?

Thanks for taking a look at this post, I've tried to find an answer but had zero luck. My situation: Yesterday I started up DS fine, designed a scene (to be honest it was larger, but not larger than what my system had rendered before. 2 characters, clothes, hair, enviroment), hit render and was stunned when I saw my CPU usage spike to 100% after the scene rendered for a few minutes (in testing it was falling back to the CPU anywhere from 2% to 94% convergence). I thought I had obviously done something wrong and started researching. The troubleshooting log gave me the following errors:

-an illegal memory access was encountered (while launching CUDA renderer in core_renderer_wf.cpp:832)

-an illegal memory access was encountered (while initializing memory buffer)

-an illegal memory access was encountered (while de-allocating memory)

- All workers failed: aborting render

-Falling back to CPU rendering

(Sorry, I cant seem to get a screen grab of the actual log to post). Has anyone run into this problem? I am far from an expert, so it is entirely possible I am missing something stupidly simple here. But it is really frustrating...

System: i5-8600K (8th gen), Z370 Aurus MOBO, 64 GB DDR4 ram, 1TB SSD, 4 TB HDD, 3x 1080ti (2 FTW and 1 SC), 1060 (6GB-running monitor) also non of the cards are overclocked, Corsair CPU Cooler, Corair 1000 watt PSU, windows 7 (64 bit).

Tried restarting DS, restarting the computer, checked to make sure all Nvidia drivers are up to date, I even tried reinstalling an older version (no luck), GPU heat isn't an issue (I use GPU-Z to keep an eye on them as well as the Corsair link). Tried reinstalling DS, even ran a virus search of the whole system (grasping at straws with that one but...), loaded a scene with just a GF3 with nothing else (no hair or clothes or add ons at all) and still the same errors (so not exceeding the memory limits of the cards). Checked PSU wattage at the plug with a belkin tester, no change from the normal while the system is under load. Literally nothing changed on the system thats run fine for months, so I'm at a loss. Scenes I rendered 2 days ago with the GPU's now fall back to the CPU. I even tried turning the cards off in the advanced settings one by one and redoing the render, but nothing changed.

The last thing I can think of is that I keep getting Windows warnings that I have unsupported hardware (newer processor on windows 7) and I need to upgrade to Windows 10 (which I have been avoiding). Could that be part of the issue? This system has been running fine for almost 4 months (with the warnings), just seems strange to have it start acting up now. I tried google searching for an "illegal memory access was encountered", but mainly ended up with hits for crypto mining issues...

Thanks for any input on the situation, and sorry if there is any missing info from this, its my first post on the forums....

Post edited by allfunandgames3D on

Comments

  • SixDsSixDs Posts: 2,384

    There was a similar issue that someone was experiencing not long ago, with the error messages tending to be a bit of misdirection. In that case the problem was not associated with either the system RAM or the video RAM, but was actually a Windows virtual memory issue. It was solved by ensuring that sufficient paging file space/virtual memory was being allocated. To check if this is an issue in your case, open Control Panel > System. Choose Advanced System Settings on the left. In the resulting System Properties dialogue, open the Advanced tab. You should see a Virtual memory section - ignore what it says there about the allocated virtual memory for now and  and click on the Change button. In the new popup window for Virtual Memory, one of three scenarios should be showing: Windows is set to automatically manage your paging file size dynamically (it will only use what it needs), which is the default setting; or a custom paging file size will have been manually set for each available drive; or no paging file has been set. If the latter, then that may be the problem and you should switch to one of the other options. If unsure, simply choose to let Windows manage the paging file. Warning, by default Windows will use your boot (C:) drive for your paging file. If that drive is full or nearly so, there might not be sufficient room left for the needed paging file resulting in an out-of-memory error when Windows needs to use it. You will either need to free up and maintain sufficient space on the drive, or switch the paging file to another, if you have it, that does have sufficient space.

    I cannot say whether this is your particular problem or not, but its worth checking out.

  • Thanks for the quick response SixDs. I took a look at what you were talking about, the selection is set in its default position (Automatically manage paging size). On the pop up, it does say that the recommended amount is 98221 MB and currently lists 65481MB allocated, could that be the issue? I have a 256 GB SSD as the C drive, currently it is telling me that there is about 78 GB in free space available. Sorry if that is a dumb question, I'm just starting to teach myself about this stuff. I have a new 4 TB HDD with almost nothing on it, would it be a good test to switch the virtual memory selection to that drive to see if anything changes? Thanks again for the input!

  • ebergerlyebergerly Posts: 3,255
    edited July 2018

    I think the pagefile issue is irrelevant here. The system has 64GB of system RAM. It only needs the pagefile if it runs out of system RAM, which I doubt it's doing. Task Manager should tell you for sure. I also have 64GB of VRAM, and the system gives an automatic paging file size of only around 9GB, same as you, and it works fine, even on big scenes that use around 30GB of system RAM.  

    Personally, I think your system is probably far too complicated at this point to narrow down the problem (4 GPU's, etc.), so I'd recommend you simplify it drastically to remove any unneeded hardware & sofware, and try to render the scene with the least amount of hardware. With all that hardware and software it could be anything. Drivers, hardware problems, etc, etc. And while you're doing that, monitor the heck out of everything, using GPU-Z, etc. Monitor temperatures and hardware usage. 

    EDIT: Also keep in mind that if you load a scene and do a render, then merge some more into the scene and do another render, it won't first erase the old VRAM usage, but it will just add to the VRAM usage, and that might cause it to run out of VRAM and crash to CPU. So if you're doing that, without first closing and re-opening Studio and reloading the desired scene from scratch you might have problems

     

    Post edited by ebergerly on
  • ebergerlyebergerly Posts: 3,255
    Seems like others are reporting GPU memory access errors and renders crashing to CPU, and a common thread seems to be Win 7 64 bit and "latest drivers". Maybe its time to visit the NVIDIA drivers forum and see if others are seeing the same
  • allfunandgames3Dallfunandgames3D Posts: 5
    edited July 2018

    I stand corrected, issue still ongoin :(

    Post edited by allfunandgames3D on
  • weirdly enough, there wasnt really a mention of the error that I could find doing a quick search on the Nvidia forums...

     

Sign In or Register to comment.