Daz Studio Pro BETA - version 4.12.2.60! (*UPDATED*)

1313234363774

Comments

  • marble said:

    Whilst earlier reporting that I was not experiancing CPU fallback

    I have been away from DAZ since then, only to return yesterday, and discover I too was now experiancing the problem

    nothing changed, the PC hasn't even been powered on since that comment, so go figure

    With so many people suffering from these CPU fallback problems WTF hasn't DAZ already provided a facility to rollback both the Pro and beta to versions where this was not a problem

    instead they would rather that we all suffer until they find a solution

    which may be soon, but most likely will be later, either way they are not coming forward and keeping us users informed about where they stand in the solution

    we get total silence, as if the problem does not exist

    Whilst DAZ for me is a hobby, I have great sympathies for people relying on DAZ for their income, they must be tearing their hair out right now

     

    Because many of the changes that may be contributing to these issues would require DAZ to stop updating to newer versions of Iray and the changes that are being made to it, most likely.

    Right now we have people who didn't upgrade from 4.11, those who are using 4.11 for some renders and a 4.12 beta for others, some who took the plunge and upgraded so have a 4.12 General Release and still others who have both the 4.12 GR and one of the beta versions. Add in to  that confusion the fact that there have been several NVidia driver updates since 4.12 hit the scene and some users with GTX cards and others with RTX ... I mean, it is a nightmare.

     

    This isn't much different than other transitions between version numbers, insofar as some people staying with older versions goes. We also need to remember that there has been a major change in the design of Iray to support the RTX video cards, since nVidia chose not to add support for those cards to OptiX Prime, but instead use the full function OptiX for them in Iray. DAZ3D has to work with nVidia to get any issues cleared up and sometimes it's difficult to find where in the mix a particular bug actually resides.
  • Richard HaseltineRichard Haseltine Posts: 100,839
    edited December 2019

    Ok...

    1. I reinstalled Windows 10 -- now I am on 1909 version, fully patched.

    2. I installed NVIDIA drivers 441.41.

    3. I installed DAZ Studio 4.12.0.86 and DAZ Studio Public Beta 4.12.1.40

    In 4.12.0.86 I don't get CPU immediate fallback only if I load something into the scene before switching to Iray preview.

    In 4.12.1.40 I now get immediate CPU fallback as soon as I switch any material -- for example load different eye color for the figure.

    TL;DR -- I can't render with confidence on my RTX 2080 Ti anymore.

    Great job DAZ and NVIDIA, especially the part of not even replying to tickets or forum threads where people ask for ETA for a fix, any kind of fix for this.

    They can't give information they don't have, or of which they are not confident. That doesn't, of course, mean it isn't being looked into.

    Post edited by Richard Haseltine on
  • Saxa -- SDSaxa -- SD Posts: 872
    edited December 2019
    jrlaudio said:
    I am having the same memory allocation problems with 4.12 as described above; where with 4.11 none existed. This is rendering a single image. Now sure I render at 2K (16:9) and commonly have multiple fully clothed G3 and G8 figures and props, but I never had this issue before on 4.11. The images default to CPU. I'm not making animations, just single images.

    Glad you wrote that.  My impression was memory usage had deteriorated with 4.12.  But we are also typically upgrading Nvidia drivers too. Did you test with Nvidia drivers that were available around that time?  Since your vidcards with issues are Titan XPs that should be less of an issue using older Nvidia drivers?  Just curious.

    Yesterday I re-ran a saved scene from my early test days on GTX1070 (8gb Vram), 32GB ram, and same CPU, and WIN7 Pro x64.
    Now on an RTX2080Ti (11GB Vram) and 64GB system-ram.  Hardrives are still 7200 rpm.

    Scene consisted of 13x G8Fs all with hair, outfits and one HDRI.  Scene was optimized to 2k textures.
    Load time for scene was 9minutes and 28 seconds.  Do not remember it being anywhere near that long with pre 4.12 versions.  Almost 10minutes is a long time.  Seem to remember closer to 6 minutes before when my runtime was 1/4 to 1/3 the size.
    But that may be because my runtime is alot bigger with many more arrays of characters and morphs that maybe need to be considered on scene load for each character?  Just trying to understand how big an impact runtime size is a factor when loading characters. 

    Yes, each morph files (and UV set and clone shape etc.) needs to be read in with the figure. The main data, the actual chnages in vertex psotion, is not kept in memory unless it's needed (non-zero value) but the fiel has to be read to see how it relates to other properties (cotrnolling or controlled by) so that DS knows when it will be non-zero.

    Thanks for replies. Finally getting around to looking in this thread again.

    Sounds like it will be worthwhile to prune out any morphs or characters I won't likely use.

     

    While re-rendering this scene, it rendered fine.
    But if i did more re-renders it soon dropped to CPU.  Even had one mid-render drop to CPU, which was a first for me.
    This was on 4.12.0.86 with 441.12 driver, which is solid for 3 characters and a heavy scene.

    Little reluctant to say this without more hardproof.
    But impressison is had less issues with GTX1070 on older Daz pre 4.12/older Nvdia drivers with this scene for load time (separate issue likely) and for rendering.
    Would need to setup old system and get older 4.11 DS to be sure.  Which is not going to happen. So am just left with this strong impression something changed.

    Different versions of Iray, the newest version always uses OptiX Prime on non-RTX cards which has a memory overhead conmpared to when it wasn't used in older versions for one thing.

    So is it correct to say that RTX cards have this increased memory overhead too...under the newer Iray Nvidia driver which DS has to work with?   Would explain things.  Cost of RTX implementation if I can synthesize RayDAnt's explantion of IRAY development "Nvidia drivers useable with 4.11" (hodge-podge accumulation) vs "Nvidia drivers useable with 4.12" (reimplemenation).

    But if I did mis-infer that from what you wrote, then, just to be clear, the example I was citing was about scene performance (1) GTX1070 with DS 4.11.*.* vs (2) RTX2080Ti with  DS 4.12 with (1) being a somewhat better scene-handling experience (based on memory as didn't record all detail of scene handling).  Actual render times are of course so much better when it does stay on GPU.

    Post edited by Saxa -- SD on
  •  

     

    While re-rendering this scene, it rendered fine.
    But if i did more re-renders it soon dropped to CPU.  Even had one mid-render drop to CPU, which was a first for me.
    This was on 4.12.0.86 with 441.12 driver, which is solid for 3 characters and a heavy scene.

    Little reluctant to say this without more hardproof.
    But impressison is had less issues with GTX1070 on older Daz pre 4.12/older Nvdia drivers with this scene for load time (separate issue likely) and for rendering.
    Would need to setup old system and get older 4.11 DS to be sure.  Which is not going to happen. So am just left with this strong impression something changed.

    Different versions of Iray, the newest version always uses OptiX Prime on non-RTX cards which has a memory overhead conmpared to when it wasn't used in older versions for one thing.

    So is it correct to say that RTX cards have this increased memory overhead too...under the newer Iray Nvidia driver which DS has to work with?   Would explain things.  Cost of RTX implementation if I can synthesize RayDAnt's explantion of IRAY development "Nvidia drivers useable with 4.11" (hodge-podge accumulation) vs "Nvidia drivers useable with 4.12" (reimplemenation).

    But if I did mis-infer that from what you wrote, then, just to be clear, the example I was citing was about scene performance (1) GTX1070 with DS 4.11.*.* vs (2) RTX2080Ti with  DS 4.12 with (1) being a somewhat better scene-handling experience (based on memory as didn't record all detail of scene handling).  Actual render times are of course so much better when it does stay on GPU.

    Iray doesn't use OptiX Prime for RTX cards; it uses OptiX 6 or 7. It only uses OptiX Prime for GTX cards.

  • marblemarble Posts: 7,500
     

     

    While re-rendering this scene, it rendered fine.
    But if i did more re-renders it soon dropped to CPU.  Even had one mid-render drop to CPU, which was a first for me.
    This was on 4.12.0.86 with 441.12 driver, which is solid for 3 characters and a heavy scene.

    Little reluctant to say this without more hardproof.
    But impressison is had less issues with GTX1070 on older Daz pre 4.12/older Nvdia drivers with this scene for load time (separate issue likely) and for rendering.
    Would need to setup old system and get older 4.11 DS to be sure.  Which is not going to happen. So am just left with this strong impression something changed.

    Different versions of Iray, the newest version always uses OptiX Prime on non-RTX cards which has a memory overhead conmpared to when it wasn't used in older versions for one thing.

    So is it correct to say that RTX cards have this increased memory overhead too...under the newer Iray Nvidia driver which DS has to work with?   Would explain things.  Cost of RTX implementation if I can synthesize RayDAnt's explantion of IRAY development "Nvidia drivers useable with 4.11" (hodge-podge accumulation) vs "Nvidia drivers useable with 4.12" (reimplemenation).

    But if I did mis-infer that from what you wrote, then, just to be clear, the example I was citing was about scene performance (1) GTX1070 with DS 4.11.*.* vs (2) RTX2080Ti with  DS 4.12 with (1) being a somewhat better scene-handling experience (based on memory as didn't record all detail of scene handling).  Actual render times are of course so much better when it does stay on GPU.

    Iray doesn't use OptiX Prime for RTX cards; it uses OptiX 6 or 7. It only uses OptiX Prime for GTX cards.

    I'm not sure that addresses the point which is, from what I can see, does Optix (whatever version) increase VRAM requirements? I understand that RTX does not work without Optix and cannot, therefore, be measured against non-Optix requirements. The fact still remains that scenes that would render in 4.11 will fall back to CPU in 4.12.

  • So I haven't used Daz studio in a while, but I popped in here few days ago and read some of the comments about the memory problems. At that time I was trying to get genx2 working and didn't get far enough to do any renders. Today I tried to render a new blank scene with 1 g8 male with hair no clothes and one g3 female with hair and clothes. The render fell back to cpu.

    error IRAY:RENDER ::   1.2   IRAY   rend error: OptiX Prime error (Device rtpModelUpdate BL): Memory allocation failed (Function "_rtpModelUpdate" caught exception: Encountered a CUDA error: cudaMalloc(&ptr, size) returned (2): out of memory).

    I have never had a problem with such a basic scene fitting in my memory before. The program I use to estimate the memory requirements for the scene shows I should have 2 GB of VRAM to spare. I have always avoided optix prime in the past because it always seemed to use more memory.

    This is insane. I can't afford to spend $1000 dollars on a new graphics card just to be able to render 2 half-naked figures without any props. For now I might be able to switch to an older version, but I won't be able to continue this hobby if Daz does not do something about the memory usage.

  • Saxa -- SDSaxa -- SD Posts: 872
    edited December 2019
    marble said:
     

     

    While re-rendering this scene, it rendered fine.
    But if i did more re-renders it soon dropped to CPU.  Even had one mid-render drop to CPU, which was a first for me.
    This was on 4.12.0.86 with 441.12 driver, which is solid for 3 characters and a heavy scene.

    Little reluctant to say this without more hardproof.
    But impressison is had less issues with GTX1070 on older Daz pre 4.12/older Nvdia drivers with this scene for load time (separate issue likely) and for rendering.
    Would need to setup old system and get older 4.11 DS to be sure.  Which is not going to happen. So am just left with this strong impression something changed.

    Different versions of Iray, the newest version always uses OptiX Prime on non-RTX cards which has a memory overhead conmpared to when it wasn't used in older versions for one thing.

    So is it correct to say that RTX cards have this increased memory overhead too...under the newer Iray Nvidia driver which DS has to work with?   Would explain things.  Cost of RTX implementation if I can synthesize RayDAnt's explantion of IRAY development "Nvidia drivers useable with 4.11" (hodge-podge accumulation) vs "Nvidia drivers useable with 4.12" (reimplemenation).

    But if I did mis-infer that from what you wrote, then, just to be clear, the example I was citing was about scene performance (1) GTX1070 with DS 4.11.*.* vs (2) RTX2080Ti with  DS 4.12 with (1) being a somewhat better scene-handling experience (based on memory as didn't record all detail of scene handling).  Actual render times are of course so much better when it does stay on GPU.

    Iray doesn't use OptiX Prime for RTX cards; it uses OptiX 6 or 7. It only uses OptiX Prime for GTX cards.

    I'm not sure that addresses the point which is, from what I can see, does Optix (whatever version) increase VRAM requirements? I understand that RTX does not work without Optix and cannot, therefore, be measured against non-Optix requirements. The fact still remains that scenes that would render in 4.11 will fall back to CPU in 4.12.

    Thanks Marble.

    Just want to know if memory overhead has increased under newer Nvidia driver/Iray & the DS built to implement those.  If it does, as an RTX user, would be OK with this in what is hopefully a short-term situation.  Do feel very sorry for GTX users if it as it appears are also dragged into this RTX migration soup.  And would agree with algovincian that Nvidia and it's drivers are likely at the root of this situation.

    Richard's comments seemed to imply that new Nvidia versions have more memory overhead for all cards which explains why GTX1070/DS 4.11.*.* handled scene better than RTX2080Ti/DS 4.12.*.*.  Or he didn't fully get my situation, which is why I phrased it like I did, just in case.  The under the hood "Optix Version this or that" is less important to me cos I am approaching this as a user, and not someone who wants to know every technical fine detail.  My area of interest is more in 3D/textures/animation technical detail, but want to know enough to make some informed decisions about the software environement am using.

     

    Post edited by Saxa -- SD on
  • RayDAntRayDAnt Posts: 1,135
    edited December 2019
    marble said:
     

     

    While re-rendering this scene, it rendered fine.
    But if i did more re-renders it soon dropped to CPU.  Even had one mid-render drop to CPU, which was a first for me.
    This was on 4.12.0.86 with 441.12 driver, which is solid for 3 characters and a heavy scene.

    Little reluctant to say this without more hardproof.
    But impressison is had less issues with GTX1070 on older Daz pre 4.12/older Nvdia drivers with this scene for load time (separate issue likely) and for rendering.
    Would need to setup old system and get older 4.11 DS to be sure.  Which is not going to happen. So am just left with this strong impression something changed.

    Different versions of Iray, the newest version always uses OptiX Prime on non-RTX cards which has a memory overhead conmpared to when it wasn't used in older versions for one thing.

    So is it correct to say that RTX cards have this increased memory overhead too...under the newer Iray Nvidia driver which DS has to work with?   Would explain things.  Cost of RTX implementation if I can synthesize RayDAnt's explantion of IRAY development "Nvidia drivers useable with 4.11" (hodge-podge accumulation) vs "Nvidia drivers useable with 4.12" (reimplemenation).

    But if I did mis-infer that from what you wrote, then, just to be clear, the example I was citing was about scene performance (1) GTX1070 with DS 4.11.*.* vs (2) RTX2080Ti with  DS 4.12 with (1) being a somewhat better scene-handling experience (based on memory as didn't record all detail of scene handling).  Actual render times are of course so much better when it does stay on GPU.

    Iray doesn't use OptiX Prime for RTX cards; it uses OptiX 6 or 7. It only uses OptiX Prime for GTX cards.

    I'm not sure that addresses the point which is, from what I can see, does Optix (whatever version) increase VRAM requirements? I understand that RTX does not work without Optix and cannot, therefore, be measured against non-Optix requirements. The fact still remains that scenes that would render in 4.11 will fall back to CPU in 4.12.

    The Simple answer to @Saxa -- SD's original question is yes - even on RTX hardware there is an increased memory overhead connected to raytraced rendering versus older, less efficient methods (like the now-defunct "built-in" legacy raytracing alternative to OptiX Prime acceleration found in Iray versions 2018.1.3 or earlier) or just non-raytraced rendering altogether. However, the amount of overhead needed on RTX cards is both smaller and less variable during an active render than what's seen on a non-RTX card due to the presence of RTCores.

    The underlying reason for this is that the current industry standard implementation of raytracing is itself actually two closely related but computationally very distinct computing tasks rolled into one: 

    1. BVH Acceleration Structure Building
    2. Ray Casting

    The first of these tasks (BVH Acceleration Structure Building) is computed once prior to the start of a given render and necessitates using a very large amount of memory during the first few seconds of operation, then a somewhat smaller amount throughout the entire rest of the rendering process. In contrast, the second of these tasks (Ray Casting) gets repeated many many times throughout the entire rendering process and is both computationally intensive and necessitates a widely variable memory footprint when accomplished via strictly software means (eg. OptiX Prime.) If you check the Turing architecture whitepaper (page 30) you will see that:

    RT Cores work together with advanced denoising filtering, a highly-efficient BVH acceleration structure developed by NVIDIA Research, and RTX compatible APIs to achieve real time ray tracing on single Turing GPU. RT Cores traverse the BVH autonomously, and by accelerating traversal and ray/triangle intersection tests, they offload the SM, allowing it to handle other vertex, pixel, and compute shading work. Functions such as BVH building and refitting are handled by the driver, and ray generation and shading is managed by the application through new types of shaders.

    Meaning that RTX cards are able to eskew the additional memory/computing resources required by non-RTX cards to accomplish Ray Casting (via RTCores), but still need to rely on basic system resources (ie. VRAM) to accomplish construction/storage of BVH Structures during rendering. Which is the usual cause of CPU fallback being triggered at the very beginning of attempting to render a scene close to a GPU's VRAM limit.

    Imo it's also worth noting that the recent spate of problems with rendering animations (memory use spikes leading to CPU fallback at the very beginning of a subsequent frame in a render sequence) is very likely a side-effect of poorly managed BVH Structures on Iray's part. Which wouldn't be too much of a surprise given that Nvidia has "better scheduling performance/less overhead" listed as a major todo item on its most recently available Iray RTX development roadmap (see slide 33.)

    Post edited by RayDAnt on
  • @RayDANT

    That explanation should be pinned somewhere handy.  Thank you. 

    Would you have a value-range for what the possible "overhead memory increase" would be compared to pre-RTX Iray builds?  My impression is it's not insignificant. Like 10-20% more or maybe more.  Again basing this "impression" on one test had mentionned earlier, which may be quite unfair with one case test, and there may be other unaccounted for variables too.

  • RayDAntRayDAnt Posts: 1,135
    edited December 2019

    Would you have a value-range for what the possible "overhead memory increase" would be compared to pre-RTX Iray builds? My impression is it's not insignificant. Like 10-20% more or maybe more.  Again basing this "impression" on one test had mentionned earlier, which may be quite unfair with one case test, and there may be other unaccounted for variables too.

    Unfortunately there is no easy answer to this question for either RTX or  GTX cards because - just like with the performance increases seen with hardware accelerated raytracing - the amount of memory required is scene dependent. More precisely it is scene content abundancy and individual-scene-object-placement dependent, since those are the two biggest factors in determining how complex/memory consuming the creation and storage of a scene's BVH Acceleration Structure is.

    For GTX cards, final render resolution can also be a major factor since more individual pixels needing to be resolved = vastly greater numbers of in-software raytracing ops to perform = more potential for additional memory needed to store intermediate values from ongoing Ray Casting operations since the fundamental dilemma with raytracing has always been that individual Ray Casting operations take an open-ended number of iterative steps to complete. Hence why people using OptiX Prime to render scenes close to the vram limit of their GPUs sometimes get CPU fallback on eg. iteration 666 out of 1400 for no apparent reason. Chances are that the particular combination of paths needing to be traced for each pixel on that iteration (which, by the way, is not technically chosen at random for each iteration despite how convincing Iray's progressive visual updating schema is) might just happen to need the max number of steps to resolve. Leading to a momentary signfiicant uptick in needed working memory beyond what the hardware can provide.

    Post edited by RayDAnt on
  • RayDAnt said:

    Would you have a value-range for what the possible "overhead memory increase" would be compared to pre-RTX Iray builds? My impression is it's not insignificant. Like 10-20% more or maybe more.  Again basing this "impression" on one test had mentionned earlier, which may be quite unfair with one case test, and there may be other unaccounted for variables too.

    Unfortunately there is no easy answer to this question for either RTX or  GTX cards because - just like with the performance increases seen with hardware accelerated raytracing - the amount of memory required is scene dependent. More precisely it is scene content abundancy and individual-scene-object-placement dependent, since those are the two biggest factors in determining how complex/memory consuming the creation and storage of a scene's BVH Acceleration Structure is.

    For GTX cards, final render resolution can also be a major factor since more individual pixels needing to be resolved = vastly greater numbers of in-software raytracing ops to perform = more potential for additional memory needed to store intermediate values from ongoing Ray Casting operations since the fundamental dilemma with raytracing has always been that individual Ray Casting operations take an open-ended number of iterative steps to complete. Hence why people using OptiX Prime to render scenes close to the vram limit of their GPUs sometimes get CPU fallback on eg. iteration 666 out of 1400 for no apparent reason. Chances are that the particular combination of paths needing to be traced for each pixel on that iteration (which, by the way, is not technically chosen at random for each iteration despite how convincing Iray's progressive visual updating schema is) might just happen to need the max number of steps to resolve. Leading to a momentary signfiicant uptick in needed working memory beyond what the hardware can provide.

    This is probably pretty far up the ladder but what keeps Nvidia from developing out of core like Octane for this type of scenario where it can utilize more RAM in lieu of VRAM where necessary? I can imagine that iRay's nature is far different than other render engines but it seems like something that could solve issues for iRay if they had this.

  • RayDAnt said:

    Would you have a value-range for what the possible "overhead memory increase" would be compared to pre-RTX Iray builds? My impression is it's not insignificant. Like 10-20% more or maybe more.  Again basing this "impression" on one test had mentionned earlier, which may be quite unfair with one case test, and there may be other unaccounted for variables too.

    Unfortunately there is no easy answer to this question for either RTX or  GTX cards because - just like with the performance increases seen with hardware accelerated raytracing - the amount of memory required is scene dependent. More precisely it is scene content abundancy and individual-scene-object-placement dependent, since those are the two biggest factors in determining how complex/memory consuming the creation and storage of a scene's BVH Acceleration Structure is.

    For GTX cards, final render resolution can also be a major factor since more individual pixels needing to be resolved = vastly greater numbers of in-software raytracing ops to perform = more potential for additional memory needed to store intermediate values from ongoing Ray Casting operations since the fundamental dilemma with raytracing has always been that individual Ray Casting operations take an open-ended number of iterative steps to complete. Hence why people using OptiX Prime to render scenes close to the vram limit of their GPUs sometimes get CPU fallback on eg. iteration 666 out of 1400 for no apparent reason. Chances are that the particular combination of paths needing to be traced for each pixel on that iteration (which, by the way, is not technically chosen at random for each iteration despite how convincing Iray's progressive visual updating schema is) might just happen to need the max number of steps to resolve. Leading to a momentary signfiicant uptick in needed working memory beyond what the hardware can provide.

    This is probably pretty far up the ladder but what keeps Nvidia from developing out of core like Octane for this type of scenario where it can utilize more RAM in lieu of VRAM where necessary? I can imagine that iRay's nature is far different than other render engines but it seems like something that could solve issues for iRay if they had this.

    I think what he's trying to get across is that because this version of Iray is a major rewrite compared to the previous versions, it quite likely is not anywhere near as well optimized as the older versions are. Out of core memory wouldn't really do anything to help with that.

  • RayDAntRayDAnt Posts: 1,135
    RayDAnt said:

    Would you have a value-range for what the possible "overhead memory increase" would be compared to pre-RTX Iray builds? My impression is it's not insignificant. Like 10-20% more or maybe more.  Again basing this "impression" on one test had mentionned earlier, which may be quite unfair with one case test, and there may be other unaccounted for variables too.

    Unfortunately there is no easy answer to this question for either RTX or  GTX cards because - just like with the performance increases seen with hardware accelerated raytracing - the amount of memory required is scene dependent. More precisely it is scene content abundancy and individual-scene-object-placement dependent, since those are the two biggest factors in determining how complex/memory consuming the creation and storage of a scene's BVH Acceleration Structure is.

    For GTX cards, final render resolution can also be a major factor since more individual pixels needing to be resolved = vastly greater numbers of in-software raytracing ops to perform = more potential for additional memory needed to store intermediate values from ongoing Ray Casting operations since the fundamental dilemma with raytracing has always been that individual Ray Casting operations take an open-ended number of iterative steps to complete. Hence why people using OptiX Prime to render scenes close to the vram limit of their GPUs sometimes get CPU fallback on eg. iteration 666 out of 1400 for no apparent reason. Chances are that the particular combination of paths needing to be traced for each pixel on that iteration (which, by the way, is not technically chosen at random for each iteration despite how convincing Iray's progressive visual updating schema is) might just happen to need the max number of steps to resolve. Leading to a momentary signfiicant uptick in needed working memory beyond what the hardware can provide.

    This is probably pretty far up the ladder but what keeps Nvidia from developing out of core like Octane for this type of scenario where it can utilize more RAM in lieu of VRAM where necessary? I can imagine that iRay's nature is far different than other render engines but it seems like something that could solve issues for iRay if they had this.

    From page 36 of the most recent version of Nvidia's design overview of Iray (found here):

    5.6 No Out of Core Data

    To simplify data access and maximize performance, Iray is not required to support out of core scene data, i.e. each GPU holds a full copy of all scene data. This may limit the size of supported scenes on low-end and older generation GPUs.

    However, this limitation does not apply to the outputs of Iray, as framebuffers can be of arbitrary size and there can be multiple outputs enabled at the same time (see Sec. 4.1).

    As current GPUs feature memory sizes of up to 24GB and the core also supports instancing of objects, scenes originating from the application domain of Iray so far also did not exceed this constraint. For more complex scenes such as those seen in the visual effects industry, this is of course insufficient. Possibilities to overcome this limitation include unified virtual memory (UVM) or NVLINK. 

    The simple answer to your question is Multi-GPU scalability. As evidenced by the benchmarks found here, Iray enjoys an almost unprecedentedly small performance loss factor when utilizing multiple GPUs/CPUs/even entire separate computing systems for a single rendering task (eg. two 2080Tis are almost exactly twice as efficent as a single 2080Ti.) And the primary reason for this is because of not allowing out-of-core rendering (with the exception of NVLink + Linux based systems) since OOC introduces potentially massive amounts of latency into the overall computation process. And since Iray gets its lineage from Mental Ray (a classic 19990-2000s Hollywood movie effects engine famous for having been used in films like the Matrix) and pre-viz in the auto industry, being able to get the most out of potentially massive numbers of separate compute units has always been a design priority (page 25 of the document linked above has a handy table illustrating this further.)

  • IvyIvy Posts: 7,165
    edited December 2019

    so basically that screws the GTX gpu's making them obsolete for iray.  , well crap back to using 3ddelight for rendering animation, because as it is with this version of DAZ I can't render a single scene with iray with out fall back to cpu and i mise well use 3dlight if I have to render cpu

    Post edited by Ivy on
  • RayDAnt said:
    RayDAnt said:

    Would you have a value-range for what the possible "overhead memory increase" would be compared to pre-RTX Iray builds? My impression is it's not insignificant. Like 10-20% more or maybe more.  Again basing this "impression" on one test had mentionned earlier, which may be quite unfair with one case test, and there may be other unaccounted for variables too.

    Unfortunately there is no easy answer to this question for either RTX or  GTX cards because - just like with the performance increases seen with hardware accelerated raytracing - the amount of memory required is scene dependent. More precisely it is scene content abundancy and individual-scene-object-placement dependent, since those are the two biggest factors in determining how complex/memory consuming the creation and storage of a scene's BVH Acceleration Structure is.

    For GTX cards, final render resolution can also be a major factor since more individual pixels needing to be resolved = vastly greater numbers of in-software raytracing ops to perform = more potential for additional memory needed to store intermediate values from ongoing Ray Casting operations since the fundamental dilemma with raytracing has always been that individual Ray Casting operations take an open-ended number of iterative steps to complete. Hence why people using OptiX Prime to render scenes close to the vram limit of their GPUs sometimes get CPU fallback on eg. iteration 666 out of 1400 for no apparent reason. Chances are that the particular combination of paths needing to be traced for each pixel on that iteration (which, by the way, is not technically chosen at random for each iteration despite how convincing Iray's progressive visual updating schema is) might just happen to need the max number of steps to resolve. Leading to a momentary signfiicant uptick in needed working memory beyond what the hardware can provide.

    This is probably pretty far up the ladder but what keeps Nvidia from developing out of core like Octane for this type of scenario where it can utilize more RAM in lieu of VRAM where necessary? I can imagine that iRay's nature is far different than other render engines but it seems like something that could solve issues for iRay if they had this.

    From page 36 of the most recent version of Nvidia's design overview of Iray (found here):

    5.6 No Out of Core Data

    To simplify data access and maximize performance, Iray is not required to support out of core scene data, i.e. each GPU holds a full copy of all scene data. This may limit the size of supported scenes on low-end and older generation GPUs.

    However, this limitation does not apply to the outputs of Iray, as framebuffers can be of arbitrary size and there can be multiple outputs enabled at the same time (see Sec. 4.1).

    As current GPUs feature memory sizes of up to 24GB and the core also supports instancing of objects, scenes originating from the application domain of Iray so far also did not exceed this constraint. For more complex scenes such as those seen in the visual effects industry, this is of course insufficient. Possibilities to overcome this limitation include unified virtual memory (UVM) or NVLINK. 

    The simple answer to your question is Multi-GPU scalability. As evidenced by the benchmarks found here, Iray enjoys an almost unprecedentedly small performance loss factor when utilizing multiple GPUs/CPUs/even entire separate computing systems for a single rendering task (eg. two 2080Tis are almost exactly twice as efficent as a single 2080Ti.) And the primary reason for this is because of not allowing out-of-core rendering (with the exception of NVLink + Linux based systems) since OOC introduces potentially massive amounts of latency into the overall computation process. And since Iray gets its lineage from Mental Ray (a classic 19990-2000s Hollywood movie effects engine famous for having been used in films like the Matrix) and pre-viz in the auto industry, being able to get the most out of potentially massive numbers of separate compute units has always been a design priority (page 25 of the document linked above has a handy table illustrating this further.)

    Thanks for pointing this out and expounding further. It seems to show that Nvidia is testing against 24GB card limit (Titan RTX I assume) and that anything more than this limit would be "complex". Seems to me this is a direct indication that, at least for iRay, if you want to keep up get a Titan...
  • RayDAnt said:
    RayDAnt said:

    Would you have a value-range for what the possible "overhead memory increase" would be compared to pre-RTX Iray builds? My impression is it's not insignificant. Like 10-20% more or maybe more.  Again basing this "impression" on one test had mentionned earlier, which may be quite unfair with one case test, and there may be other unaccounted for variables too.

    Unfortunately there is no easy answer to this question for either RTX or  GTX cards because - just like with the performance increases seen with hardware accelerated raytracing - the amount of memory required is scene dependent. More precisely it is scene content abundancy and individual-scene-object-placement dependent, since those are the two biggest factors in determining how complex/memory consuming the creation and storage of a scene's BVH Acceleration Structure is.

    For GTX cards, final render resolution can also be a major factor since more individual pixels needing to be resolved = vastly greater numbers of in-software raytracing ops to perform = more potential for additional memory needed to store intermediate values from ongoing Ray Casting operations since the fundamental dilemma with raytracing has always been that individual Ray Casting operations take an open-ended number of iterative steps to complete. Hence why people using OptiX Prime to render scenes close to the vram limit of their GPUs sometimes get CPU fallback on eg. iteration 666 out of 1400 for no apparent reason. Chances are that the particular combination of paths needing to be traced for each pixel on that iteration (which, by the way, is not technically chosen at random for each iteration despite how convincing Iray's progressive visual updating schema is) might just happen to need the max number of steps to resolve. Leading to a momentary signfiicant uptick in needed working memory beyond what the hardware can provide.

    This is probably pretty far up the ladder but what keeps Nvidia from developing out of core like Octane for this type of scenario where it can utilize more RAM in lieu of VRAM where necessary? I can imagine that iRay's nature is far different than other render engines but it seems like something that could solve issues for iRay if they had this.

    From page 36 of the most recent version of Nvidia's design overview of Iray (found here):

    5.6 No Out of Core Data

    To simplify data access and maximize performance, Iray is not required to support out of core scene data, i.e. each GPU holds a full copy of all scene data. This may limit the size of supported scenes on low-end and older generation GPUs.

    However, this limitation does not apply to the outputs of Iray, as framebuffers can be of arbitrary size and there can be multiple outputs enabled at the same time (see Sec. 4.1).

    As current GPUs feature memory sizes of up to 24GB and the core also supports instancing of objects, scenes originating from the application domain of Iray so far also did not exceed this constraint. For more complex scenes such as those seen in the visual effects industry, this is of course insufficient. Possibilities to overcome this limitation include unified virtual memory (UVM) or NVLINK. 

    The simple answer to your question is Multi-GPU scalability. As evidenced by the benchmarks found here, Iray enjoys an almost unprecedentedly small performance loss factor when utilizing multiple GPUs/CPUs/even entire separate computing systems for a single rendering task (eg. two 2080Tis are almost exactly twice as efficent as a single 2080Ti.) And the primary reason for this is because of not allowing out-of-core rendering (with the exception of NVLink + Linux based systems) since OOC introduces potentially massive amounts of latency into the overall computation process. And since Iray gets its lineage from Mental Ray (a classic 19990-2000s Hollywood movie effects engine famous for having been used in films like the Matrix) and pre-viz in the auto industry, being able to get the most out of potentially massive numbers of separate compute units has always been a design priority (page 25 of the document linked above has a handy table illustrating this further.)

     

    Thanks for pointing this out and expounding further. It seems to show that Nvidia is testing against 24GB card limit (Titan RTX I assume) and that anything more than this limit would be "complex". Seems to me this is a direct indication that, at least for iRay, if you want to keep up get a Titan...

    I'm not sure what you mean here. Out of Core Memory is not implemented as it would impose unacceptable overheads on performance, then theya re pointing out that cards do go up to 24GB and that there are other tools available for educing memory foortprint to help manage within the limits of in-core memory. It isn't saying that they are abandoning anyone with less than 24GB on their card

  • I just looked on the Dev blog for iRay (https://blog.irayrender.com/) and noticed this from Dec 9th:

    "Anonymous asked: I was wondering if in Daz3D Studio if using a pair of NVLinked 2080Ti or Titan cards allows for memory sharing or scaling for renders. I currently have a pair of 1080Ti's but can't render anything more than the 11Gb size before defaulting to CPU. I am a 3D content creator and often run out of memory withing Daz3D and would really really like a 1x+1x=2xMEMORY scenario. Is this currently possible with the RTX line up and NVLink?

    Using NVLink is possible in iray in general (see https://raytracing-docs.nvidia.com/iray/manual/index.html#iray_photoreal_render_mode#global-performance-settings), but i don’t know if Daz3D supports it, as it must be explicitly enabled. So i’d suggest bringing this question up on Daz’ forum."

    I recall there was discussion of NVLink in another thread, maybe this one, but I do not recall what the word was regarding this. Is there a plan to implement NVLink capability for iRay in DAZ? This answer indicates it is up to DAZ to support it by explicitly enabling it.

     

  • RayDAnt said:
    RayDAnt said:

    Would you have a value-range for what the possible "overhead memory increase" would be compared to pre-RTX Iray builds? My impression is it's not insignificant. Like 10-20% more or maybe more.  Again basing this "impression" on one test had mentionned earlier, which may be quite unfair with one case test, and there may be other unaccounted for variables too.

    Unfortunately there is no easy answer to this question for either RTX or  GTX cards because - just like with the performance increases seen with hardware accelerated raytracing - the amount of memory required is scene dependent. More precisely it is scene content abundancy and individual-scene-object-placement dependent, since those are the two biggest factors in determining how complex/memory consuming the creation and storage of a scene's BVH Acceleration Structure is.

    For GTX cards, final render resolution can also be a major factor since more individual pixels needing to be resolved = vastly greater numbers of in-software raytracing ops to perform = more potential for additional memory needed to store intermediate values from ongoing Ray Casting operations since the fundamental dilemma with raytracing has always been that individual Ray Casting operations take an open-ended number of iterative steps to complete. Hence why people using OptiX Prime to render scenes close to the vram limit of their GPUs sometimes get CPU fallback on eg. iteration 666 out of 1400 for no apparent reason. Chances are that the particular combination of paths needing to be traced for each pixel on that iteration (which, by the way, is not technically chosen at random for each iteration despite how convincing Iray's progressive visual updating schema is) might just happen to need the max number of steps to resolve. Leading to a momentary signfiicant uptick in needed working memory beyond what the hardware can provide.

    This is probably pretty far up the ladder but what keeps Nvidia from developing out of core like Octane for this type of scenario where it can utilize more RAM in lieu of VRAM where necessary? I can imagine that iRay's nature is far different than other render engines but it seems like something that could solve issues for iRay if they had this.

    From page 36 of the most recent version of Nvidia's design overview of Iray (found here):

    5.6 No Out of Core Data

    To simplify data access and maximize performance, Iray is not required to support out of core scene data, i.e. each GPU holds a full copy of all scene data. This may limit the size of supported scenes on low-end and older generation GPUs.

    However, this limitation does not apply to the outputs of Iray, as framebuffers can be of arbitrary size and there can be multiple outputs enabled at the same time (see Sec. 4.1).

    As current GPUs feature memory sizes of up to 24GB and the core also supports instancing of objects, scenes originating from the application domain of Iray so far also did not exceed this constraint. For more complex scenes such as those seen in the visual effects industry, this is of course insufficient. Possibilities to overcome this limitation include unified virtual memory (UVM) or NVLINK. 

    The simple answer to your question is Multi-GPU scalability. As evidenced by the benchmarks found here, Iray enjoys an almost unprecedentedly small performance loss factor when utilizing multiple GPUs/CPUs/even entire separate computing systems for a single rendering task (eg. two 2080Tis are almost exactly twice as efficent as a single 2080Ti.) And the primary reason for this is because of not allowing out-of-core rendering (with the exception of NVLink + Linux based systems) since OOC introduces potentially massive amounts of latency into the overall computation process. And since Iray gets its lineage from Mental Ray (a classic 19990-2000s Hollywood movie effects engine famous for having been used in films like the Matrix) and pre-viz in the auto industry, being able to get the most out of potentially massive numbers of separate compute units has always been a design priority (page 25 of the document linked above has a handy table illustrating this further.)

     

    Thanks for pointing this out and expounding further. It seems to show that Nvidia is testing against 24GB card limit (Titan RTX I assume) and that anything more than this limit would be "complex". Seems to me this is a direct indication that, at least for iRay, if you want to keep up get a Titan...

    I'm not sure what you mean here. Out of Core Memory is not implemented as it would impose unacceptable overheads on performance, then theya re pointing out that cards do go up to 24GB and that there are other tools available for educing memory foortprint to help manage within the limits of in-core memory. It isn't saying that they are abandoning anyone with less than 24GB on their card

    I am basing from this part mainly:

    "As current GPUs feature memory sizes of up to 24GB and the core also supports instancing of objects, scenes originating from the application domain of Iray so far also did not exceed this constraint."

    Since they mention it not exceeding the constraint for a 24GB card and do not mention anything below that as the basis for their testing, which at least for studio users, is likely 11GB or more likely 8GB or less it seems to me that they are only looking at that amount of VRAM in their analysis for what separates the need of complex (Over 24GB) vs non-complex (under 24GB). Which leads me to conclude they are not looking at lower limit cards as a target to optimize for.

  • RayDAnt said:
    RayDAnt said:

    Would you have a value-range for what the possible "overhead memory increase" would be compared to pre-RTX Iray builds? My impression is it's not insignificant. Like 10-20% more or maybe more.  Again basing this "impression" on one test had mentionned earlier, which may be quite unfair with one case test, and there may be other unaccounted for variables too.

    Unfortunately there is no easy answer to this question for either RTX or  GTX cards because - just like with the performance increases seen with hardware accelerated raytracing - the amount of memory required is scene dependent. More precisely it is scene content abundancy and individual-scene-object-placement dependent, since those are the two biggest factors in determining how complex/memory consuming the creation and storage of a scene's BVH Acceleration Structure is.

    For GTX cards, final render resolution can also be a major factor since more individual pixels needing to be resolved = vastly greater numbers of in-software raytracing ops to perform = more potential for additional memory needed to store intermediate values from ongoing Ray Casting operations since the fundamental dilemma with raytracing has always been that individual Ray Casting operations take an open-ended number of iterative steps to complete. Hence why people using OptiX Prime to render scenes close to the vram limit of their GPUs sometimes get CPU fallback on eg. iteration 666 out of 1400 for no apparent reason. Chances are that the particular combination of paths needing to be traced for each pixel on that iteration (which, by the way, is not technically chosen at random for each iteration despite how convincing Iray's progressive visual updating schema is) might just happen to need the max number of steps to resolve. Leading to a momentary signfiicant uptick in needed working memory beyond what the hardware can provide.

    This is probably pretty far up the ladder but what keeps Nvidia from developing out of core like Octane for this type of scenario where it can utilize more RAM in lieu of VRAM where necessary? I can imagine that iRay's nature is far different than other render engines but it seems like something that could solve issues for iRay if they had this.

    From page 36 of the most recent version of Nvidia's design overview of Iray (found here):

    5.6 No Out of Core Data

    To simplify data access and maximize performance, Iray is not required to support out of core scene data, i.e. each GPU holds a full copy of all scene data. This may limit the size of supported scenes on low-end and older generation GPUs.

    However, this limitation does not apply to the outputs of Iray, as framebuffers can be of arbitrary size and there can be multiple outputs enabled at the same time (see Sec. 4.1).

    As current GPUs feature memory sizes of up to 24GB and the core also supports instancing of objects, scenes originating from the application domain of Iray so far also did not exceed this constraint. For more complex scenes such as those seen in the visual effects industry, this is of course insufficient. Possibilities to overcome this limitation include unified virtual memory (UVM) or NVLINK. 

    The simple answer to your question is Multi-GPU scalability. As evidenced by the benchmarks found here, Iray enjoys an almost unprecedentedly small performance loss factor when utilizing multiple GPUs/CPUs/even entire separate computing systems for a single rendering task (eg. two 2080Tis are almost exactly twice as efficent as a single 2080Ti.) And the primary reason for this is because of not allowing out-of-core rendering (with the exception of NVLink + Linux based systems) since OOC introduces potentially massive amounts of latency into the overall computation process. And since Iray gets its lineage from Mental Ray (a classic 19990-2000s Hollywood movie effects engine famous for having been used in films like the Matrix) and pre-viz in the auto industry, being able to get the most out of potentially massive numbers of separate compute units has always been a design priority (page 25 of the document linked above has a handy table illustrating this further.)

     

    Thanks for pointing this out and expounding further. It seems to show that Nvidia is testing against 24GB card limit (Titan RTX I assume) and that anything more than this limit would be "complex". Seems to me this is a direct indication that, at least for iRay, if you want to keep up get a Titan...

    I'm not sure what you mean here. Out of Core Memory is not implemented as it would impose unacceptable overheads on performance, then theya re pointing out that cards do go up to 24GB and that there are other tools available for educing memory foortprint to help manage within the limits of in-core memory. It isn't saying that they are abandoning anyone with less than 24GB on their card

    I am basing from this part mainly:

    "As current GPUs feature memory sizes of up to 24GB and the core also supports instancing of objects, scenes originating from the application domain of Iray so far also did not exceed this constraint."

    Since they mention it not exceeding the constraint for a 24GB card and do not mention anything below that as the basis for their testing, which at least for studio users, is likely 11GB or more likely 8GB or less it seems to me that they are only looking at that amount of VRAM in their analysis for what separates the need of complex (Over 24GB) vs non-complex (under 24GB). Which leads me to conclude they are not looking at lower limit cards as a target to optimize for.

    As I said, I took it as meaning "there are options" - 24GB cards being one, using things like instancing being another.

  • marblemarble Posts: 7,500
    RayDAnt said:
    RayDAnt said:

    Would you have a value-range for what the possible "overhead memory increase" would be compared to pre-RTX Iray builds? My impression is it's not insignificant. Like 10-20% more or maybe more.  Again basing this "impression" on one test had mentionned earlier, which may be quite unfair with one case test, and there may be other unaccounted for variables too.

    Unfortunately there is no easy answer to this question for either RTX or  GTX cards because - just like with the performance increases seen with hardware accelerated raytracing - the amount of memory required is scene dependent. More precisely it is scene content abundancy and individual-scene-object-placement dependent, since those are the two biggest factors in determining how complex/memory consuming the creation and storage of a scene's BVH Acceleration Structure is.

    For GTX cards, final render resolution can also be a major factor since more individual pixels needing to be resolved = vastly greater numbers of in-software raytracing ops to perform = more potential for additional memory needed to store intermediate values from ongoing Ray Casting operations since the fundamental dilemma with raytracing has always been that individual Ray Casting operations take an open-ended number of iterative steps to complete. Hence why people using OptiX Prime to render scenes close to the vram limit of their GPUs sometimes get CPU fallback on eg. iteration 666 out of 1400 for no apparent reason. Chances are that the particular combination of paths needing to be traced for each pixel on that iteration (which, by the way, is not technically chosen at random for each iteration despite how convincing Iray's progressive visual updating schema is) might just happen to need the max number of steps to resolve. Leading to a momentary signfiicant uptick in needed working memory beyond what the hardware can provide.

    This is probably pretty far up the ladder but what keeps Nvidia from developing out of core like Octane for this type of scenario where it can utilize more RAM in lieu of VRAM where necessary? I can imagine that iRay's nature is far different than other render engines but it seems like something that could solve issues for iRay if they had this.

    From page 36 of the most recent version of Nvidia's design overview of Iray (found here):

    5.6 No Out of Core Data

    To simplify data access and maximize performance, Iray is not required to support out of core scene data, i.e. each GPU holds a full copy of all scene data. This may limit the size of supported scenes on low-end and older generation GPUs.

    However, this limitation does not apply to the outputs of Iray, as framebuffers can be of arbitrary size and there can be multiple outputs enabled at the same time (see Sec. 4.1).

    As current GPUs feature memory sizes of up to 24GB and the core also supports instancing of objects, scenes originating from the application domain of Iray so far also did not exceed this constraint. For more complex scenes such as those seen in the visual effects industry, this is of course insufficient. Possibilities to overcome this limitation include unified virtual memory (UVM) or NVLINK. 

    The simple answer to your question is Multi-GPU scalability. As evidenced by the benchmarks found here, Iray enjoys an almost unprecedentedly small performance loss factor when utilizing multiple GPUs/CPUs/even entire separate computing systems for a single rendering task (eg. two 2080Tis are almost exactly twice as efficent as a single 2080Ti.) And the primary reason for this is because of not allowing out-of-core rendering (with the exception of NVLink + Linux based systems) since OOC introduces potentially massive amounts of latency into the overall computation process. And since Iray gets its lineage from Mental Ray (a classic 19990-2000s Hollywood movie effects engine famous for having been used in films like the Matrix) and pre-viz in the auto industry, being able to get the most out of potentially massive numbers of separate compute units has always been a design priority (page 25 of the document linked above has a handy table illustrating this further.)

     

    Thanks for pointing this out and expounding further. It seems to show that Nvidia is testing against 24GB card limit (Titan RTX I assume) and that anything more than this limit would be "complex". Seems to me this is a direct indication that, at least for iRay, if you want to keep up get a Titan...

    I'm not sure what you mean here. Out of Core Memory is not implemented as it would impose unacceptable overheads on performance, then theya re pointing out that cards do go up to 24GB and that there are other tools available for educing memory foortprint to help manage within the limits of in-core memory. It isn't saying that they are abandoning anyone with less than 24GB on their card

    I am basing from this part mainly:

    "As current GPUs feature memory sizes of up to 24GB and the core also supports instancing of objects, scenes originating from the application domain of Iray so far also did not exceed this constraint."

    Since they mention it not exceeding the constraint for a 24GB card and do not mention anything below that as the basis for their testing, which at least for studio users, is likely 11GB or more likely 8GB or less it seems to me that they are only looking at that amount of VRAM in their analysis for what separates the need of complex (Over 24GB) vs non-complex (under 24GB). Which leads me to conclude they are not looking at lower limit cards as a target to optimize for.

    I agree - the recommendation is implicit: just buy our 24GB GPU. That is clearly the way they would like you to go. I tried instancing and it turns out that if I rotate one, I rotate them all. So the applications for instancing are quite limited and certainly not a universal panacea. You could also spend a lot of time reducing textures - more reduction for objects further away from the camera, for example. But this takes time when render times are already long - though still a fraction of CPU render times. I have 8GB and am really struggling with 4.12 to avoid the dreaded CPU fallback. I looked at the price of a 2080ti and it is way out of reach for me. Ideally, I'd like to use RTX on a 2070 Super but that appears to be a complete waste of money if it cannot cope with these VRAM requirements.

    The more I look at this, the more I'm convinced that further participation in this hobby is beyond my budget. Perhaps some combination of DAZ Studio with Unity/Unreal/Blender does offer an alternative.

  • marble said:
    RayDAnt said:
    RayDAnt said:

    Would you have a value-range for what the possible "overhead memory increase" would be compared to pre-RTX Iray builds? My impression is it's not insignificant. Like 10-20% more or maybe more.  Again basing this "impression" on one test had mentionned earlier, which may be quite unfair with one case test, and there may be other unaccounted for variables too.

    Unfortunately there is no easy answer to this question for either RTX or  GTX cards because - just like with the performance increases seen with hardware accelerated raytracing - the amount of memory required is scene dependent. More precisely it is scene content abundancy and individual-scene-object-placement dependent, since those are the two biggest factors in determining how complex/memory consuming the creation and storage of a scene's BVH Acceleration Structure is.

    For GTX cards, final render resolution can also be a major factor since more individual pixels needing to be resolved = vastly greater numbers of in-software raytracing ops to perform = more potential for additional memory needed to store intermediate values from ongoing Ray Casting operations since the fundamental dilemma with raytracing has always been that individual Ray Casting operations take an open-ended number of iterative steps to complete. Hence why people using OptiX Prime to render scenes close to the vram limit of their GPUs sometimes get CPU fallback on eg. iteration 666 out of 1400 for no apparent reason. Chances are that the particular combination of paths needing to be traced for each pixel on that iteration (which, by the way, is not technically chosen at random for each iteration despite how convincing Iray's progressive visual updating schema is) might just happen to need the max number of steps to resolve. Leading to a momentary signfiicant uptick in needed working memory beyond what the hardware can provide.

    This is probably pretty far up the ladder but what keeps Nvidia from developing out of core like Octane for this type of scenario where it can utilize more RAM in lieu of VRAM where necessary? I can imagine that iRay's nature is far different than other render engines but it seems like something that could solve issues for iRay if they had this.

    From page 36 of the most recent version of Nvidia's design overview of Iray (found here):

    5.6 No Out of Core Data

    To simplify data access and maximize performance, Iray is not required to support out of core scene data, i.e. each GPU holds a full copy of all scene data. This may limit the size of supported scenes on low-end and older generation GPUs.

    However, this limitation does not apply to the outputs of Iray, as framebuffers can be of arbitrary size and there can be multiple outputs enabled at the same time (see Sec. 4.1).

    As current GPUs feature memory sizes of up to 24GB and the core also supports instancing of objects, scenes originating from the application domain of Iray so far also did not exceed this constraint. For more complex scenes such as those seen in the visual effects industry, this is of course insufficient. Possibilities to overcome this limitation include unified virtual memory (UVM) or NVLINK. 

    The simple answer to your question is Multi-GPU scalability. As evidenced by the benchmarks found here, Iray enjoys an almost unprecedentedly small performance loss factor when utilizing multiple GPUs/CPUs/even entire separate computing systems for a single rendering task (eg. two 2080Tis are almost exactly twice as efficent as a single 2080Ti.) And the primary reason for this is because of not allowing out-of-core rendering (with the exception of NVLink + Linux based systems) since OOC introduces potentially massive amounts of latency into the overall computation process. And since Iray gets its lineage from Mental Ray (a classic 19990-2000s Hollywood movie effects engine famous for having been used in films like the Matrix) and pre-viz in the auto industry, being able to get the most out of potentially massive numbers of separate compute units has always been a design priority (page 25 of the document linked above has a handy table illustrating this further.)

     

    Thanks for pointing this out and expounding further. It seems to show that Nvidia is testing against 24GB card limit (Titan RTX I assume) and that anything more than this limit would be "complex". Seems to me this is a direct indication that, at least for iRay, if you want to keep up get a Titan...

    I'm not sure what you mean here. Out of Core Memory is not implemented as it would impose unacceptable overheads on performance, then theya re pointing out that cards do go up to 24GB and that there are other tools available for educing memory foortprint to help manage within the limits of in-core memory. It isn't saying that they are abandoning anyone with less than 24GB on their card

    I am basing from this part mainly:

    "As current GPUs feature memory sizes of up to 24GB and the core also supports instancing of objects, scenes originating from the application domain of Iray so far also did not exceed this constraint."

    Since they mention it not exceeding the constraint for a 24GB card and do not mention anything below that as the basis for their testing, which at least for studio users, is likely 11GB or more likely 8GB or less it seems to me that they are only looking at that amount of VRAM in their analysis for what separates the need of complex (Over 24GB) vs non-complex (under 24GB). Which leads me to conclude they are not looking at lower limit cards as a target to optimize for.

    I agree - the recommendation is implicit: just buy our 24GB GPU. That is clearly the way they would like you to go. I tried instancing and it turns out that if I rotate one, I rotate them all. So the applications for instancing are quite limited and certainly not a universal panacea. You could also spend a lot of time reducing textures - more reduction for objects further away from the camera, for example. But this takes time when render times are already long - though still a fraction of CPU render times. I have 8GB and am really struggling with 4.12 to avoid the dreaded CPU fallback. I looked at the price of a 2080ti and it is way out of reach for me. Ideally, I'd like to use RTX on a 2070 Super but that appears to be a complete waste of money if it cannot cope with these VRAM requirements.

    The more I look at this, the more I'm convinced that further participation in this hobby is beyond my budget. Perhaps some combination of DAZ Studio with Unity/Unreal/Blender does offer an alternative.

    You should be able to rotate and otherwise transform instances, you just can't make bone-level, shape, or material changes.

  • tikiman-3dtikiman-3d Posts: 35
    edited December 2019

    Anyone having a problem with the new timeline adding more frames? ( 3rd box from the left ) The default is 30 and when I've tried to enter a higher number it goes back to 30 and then crashes the program.

    *UPDATE*

    Yep, they did fix it. It's the 1st box from the left, not the 3rd. Op error.

     

    Post edited by tikiman-3d on
  • barbultbarbult Posts: 24,241

    Anyone having a problem with the new timeline adding more frames? ( 3rd box from the left ) The default is 30 and when I've tried to enter a higher number it goes back to 30 and then crashes the program.

    I thought they fixed that. Are you using 4.12.1.40?

  • marblemarble Posts: 7,500
    marble said:
    RayDAnt said:
    RayDAnt said:

    Would you have a value-range for what the possible "overhead memory increase" would be compared to pre-RTX Iray builds? My impression is it's not insignificant. Like 10-20% more or maybe more.  Again basing this "impression" on one test had mentionned earlier, which may be quite unfair with one case test, and there may be other unaccounted for variables too.

    Unfortunately there is no easy answer to this question for either RTX or  GTX cards because - just like with the performance increases seen with hardware accelerated raytracing - the amount of memory required is scene dependent. More precisely it is scene content abundancy and individual-scene-object-placement dependent, since those are the two biggest factors in determining how complex/memory consuming the creation and storage of a scene's BVH Acceleration Structure is.

    For GTX cards, final render resolution can also be a major factor since more individual pixels needing to be resolved = vastly greater numbers of in-software raytracing ops to perform = more potential for additional memory needed to store intermediate values from ongoing Ray Casting operations since the fundamental dilemma with raytracing has always been that individual Ray Casting operations take an open-ended number of iterative steps to complete. Hence why people using OptiX Prime to render scenes close to the vram limit of their GPUs sometimes get CPU fallback on eg. iteration 666 out of 1400 for no apparent reason. Chances are that the particular combination of paths needing to be traced for each pixel on that iteration (which, by the way, is not technically chosen at random for each iteration despite how convincing Iray's progressive visual updating schema is) might just happen to need the max number of steps to resolve. Leading to a momentary signfiicant uptick in needed working memory beyond what the hardware can provide.

    This is probably pretty far up the ladder but what keeps Nvidia from developing out of core like Octane for this type of scenario where it can utilize more RAM in lieu of VRAM where necessary? I can imagine that iRay's nature is far different than other render engines but it seems like something that could solve issues for iRay if they had this.

    From page 36 of the most recent version of Nvidia's design overview of Iray (found here):

    5.6 No Out of Core Data

    To simplify data access and maximize performance, Iray is not required to support out of core scene data, i.e. each GPU holds a full copy of all scene data. This may limit the size of supported scenes on low-end and older generation GPUs.

    However, this limitation does not apply to the outputs of Iray, as framebuffers can be of arbitrary size and there can be multiple outputs enabled at the same time (see Sec. 4.1).

    As current GPUs feature memory sizes of up to 24GB and the core also supports instancing of objects, scenes originating from the application domain of Iray so far also did not exceed this constraint. For more complex scenes such as those seen in the visual effects industry, this is of course insufficient. Possibilities to overcome this limitation include unified virtual memory (UVM) or NVLINK. 

    The simple answer to your question is Multi-GPU scalability. As evidenced by the benchmarks found here, Iray enjoys an almost unprecedentedly small performance loss factor when utilizing multiple GPUs/CPUs/even entire separate computing systems for a single rendering task (eg. two 2080Tis are almost exactly twice as efficent as a single 2080Ti.) And the primary reason for this is because of not allowing out-of-core rendering (with the exception of NVLink + Linux based systems) since OOC introduces potentially massive amounts of latency into the overall computation process. And since Iray gets its lineage from Mental Ray (a classic 19990-2000s Hollywood movie effects engine famous for having been used in films like the Matrix) and pre-viz in the auto industry, being able to get the most out of potentially massive numbers of separate compute units has always been a design priority (page 25 of the document linked above has a handy table illustrating this further.)

     

    Thanks for pointing this out and expounding further. It seems to show that Nvidia is testing against 24GB card limit (Titan RTX I assume) and that anything more than this limit would be "complex". Seems to me this is a direct indication that, at least for iRay, if you want to keep up get a Titan...

    I'm not sure what you mean here. Out of Core Memory is not implemented as it would impose unacceptable overheads on performance, then theya re pointing out that cards do go up to 24GB and that there are other tools available for educing memory foortprint to help manage within the limits of in-core memory. It isn't saying that they are abandoning anyone with less than 24GB on their card

    I am basing from this part mainly:

    "As current GPUs feature memory sizes of up to 24GB and the core also supports instancing of objects, scenes originating from the application domain of Iray so far also did not exceed this constraint."

    Since they mention it not exceeding the constraint for a 24GB card and do not mention anything below that as the basis for their testing, which at least for studio users, is likely 11GB or more likely 8GB or less it seems to me that they are only looking at that amount of VRAM in their analysis for what separates the need of complex (Over 24GB) vs non-complex (under 24GB). Which leads me to conclude they are not looking at lower limit cards as a target to optimize for.

    I agree - the recommendation is implicit: just buy our 24GB GPU. That is clearly the way they would like you to go. I tried instancing and it turns out that if I rotate one, I rotate them all. So the applications for instancing are quite limited and certainly not a universal panacea. You could also spend a lot of time reducing textures - more reduction for objects further away from the camera, for example. But this takes time when render times are already long - though still a fraction of CPU render times. I have 8GB and am really struggling with 4.12 to avoid the dreaded CPU fallback. I looked at the price of a 2080ti and it is way out of reach for me. Ideally, I'd like to use RTX on a 2070 Super but that appears to be a complete waste of money if it cannot cope with these VRAM requirements.

    The more I look at this, the more I'm convinced that further participation in this hobby is beyond my budget. Perhaps some combination of DAZ Studio with Unity/Unreal/Blender does offer an alternative.

    You should be able to rotate and otherwise transform instances, you just can't make bone-level, shape, or material changes.

    That's odd. I'm not at that computer now but the other day I created a classroom with two desk/chair objects and then created instances from them. I was disappointed to see that when I moved one, all the instances moved (or rotated). I'd like to know how to avoid that because I abandoned that scene.

  • barbultbarbult Posts: 24,241
    marble said:
    marble said:
    RayDAnt said:
    RayDAnt said:

    Would you have a value-range for what the possible "overhead memory increase" would be compared to pre-RTX Iray builds? My impression is it's not insignificant. Like 10-20% more or maybe more.  Again basing this "impression" on one test had mentionned earlier, which may be quite unfair with one case test, and there may be other unaccounted for variables too.

    Unfortunately there is no easy answer to this question for either RTX or  GTX cards because - just like with the performance increases seen with hardware accelerated raytracing - the amount of memory required is scene dependent. More precisely it is scene content abundancy and individual-scene-object-placement dependent, since those are the two biggest factors in determining how complex/memory consuming the creation and storage of a scene's BVH Acceleration Structure is.

    For GTX cards, final render resolution can also be a major factor since more individual pixels needing to be resolved = vastly greater numbers of in-software raytracing ops to perform = more potential for additional memory needed to store intermediate values from ongoing Ray Casting operations since the fundamental dilemma with raytracing has always been that individual Ray Casting operations take an open-ended number of iterative steps to complete. Hence why people using OptiX Prime to render scenes close to the vram limit of their GPUs sometimes get CPU fallback on eg. iteration 666 out of 1400 for no apparent reason. Chances are that the particular combination of paths needing to be traced for each pixel on that iteration (which, by the way, is not technically chosen at random for each iteration despite how convincing Iray's progressive visual updating schema is) might just happen to need the max number of steps to resolve. Leading to a momentary signfiicant uptick in needed working memory beyond what the hardware can provide.

    This is probably pretty far up the ladder but what keeps Nvidia from developing out of core like Octane for this type of scenario where it can utilize more RAM in lieu of VRAM where necessary? I can imagine that iRay's nature is far different than other render engines but it seems like something that could solve issues for iRay if they had this.

    From page 36 of the most recent version of Nvidia's design overview of Iray (found here):

    5.6 No Out of Core Data

    To simplify data access and maximize performance, Iray is not required to support out of core scene data, i.e. each GPU holds a full copy of all scene data. This may limit the size of supported scenes on low-end and older generation GPUs.

    However, this limitation does not apply to the outputs of Iray, as framebuffers can be of arbitrary size and there can be multiple outputs enabled at the same time (see Sec. 4.1).

    As current GPUs feature memory sizes of up to 24GB and the core also supports instancing of objects, scenes originating from the application domain of Iray so far also did not exceed this constraint. For more complex scenes such as those seen in the visual effects industry, this is of course insufficient. Possibilities to overcome this limitation include unified virtual memory (UVM) or NVLINK. 

    The simple answer to your question is Multi-GPU scalability. As evidenced by the benchmarks found here, Iray enjoys an almost unprecedentedly small performance loss factor when utilizing multiple GPUs/CPUs/even entire separate computing systems for a single rendering task (eg. two 2080Tis are almost exactly twice as efficent as a single 2080Ti.) And the primary reason for this is because of not allowing out-of-core rendering (with the exception of NVLink + Linux based systems) since OOC introduces potentially massive amounts of latency into the overall computation process. And since Iray gets its lineage from Mental Ray (a classic 19990-2000s Hollywood movie effects engine famous for having been used in films like the Matrix) and pre-viz in the auto industry, being able to get the most out of potentially massive numbers of separate compute units has always been a design priority (page 25 of the document linked above has a handy table illustrating this further.)

     

    Thanks for pointing this out and expounding further. It seems to show that Nvidia is testing against 24GB card limit (Titan RTX I assume) and that anything more than this limit would be "complex". Seems to me this is a direct indication that, at least for iRay, if you want to keep up get a Titan...

    I'm not sure what you mean here. Out of Core Memory is not implemented as it would impose unacceptable overheads on performance, then theya re pointing out that cards do go up to 24GB and that there are other tools available for educing memory foortprint to help manage within the limits of in-core memory. It isn't saying that they are abandoning anyone with less than 24GB on their card

    I am basing from this part mainly:

    "As current GPUs feature memory sizes of up to 24GB and the core also supports instancing of objects, scenes originating from the application domain of Iray so far also did not exceed this constraint."

    Since they mention it not exceeding the constraint for a 24GB card and do not mention anything below that as the basis for their testing, which at least for studio users, is likely 11GB or more likely 8GB or less it seems to me that they are only looking at that amount of VRAM in their analysis for what separates the need of complex (Over 24GB) vs non-complex (under 24GB). Which leads me to conclude they are not looking at lower limit cards as a target to optimize for.

    I agree - the recommendation is implicit: just buy our 24GB GPU. That is clearly the way they would like you to go. I tried instancing and it turns out that if I rotate one, I rotate them all. So the applications for instancing are quite limited and certainly not a universal panacea. You could also spend a lot of time reducing textures - more reduction for objects further away from the camera, for example. But this takes time when render times are already long - though still a fraction of CPU render times. I have 8GB and am really struggling with 4.12 to avoid the dreaded CPU fallback. I looked at the price of a 2080ti and it is way out of reach for me. Ideally, I'd like to use RTX on a 2070 Super but that appears to be a complete waste of money if it cannot cope with these VRAM requirements.

    The more I look at this, the more I'm convinced that further participation in this hobby is beyond my budget. Perhaps some combination of DAZ Studio with Unity/Unreal/Blender does offer an alternative.

    You should be able to rotate and otherwise transform instances, you just can't make bone-level, shape, or material changes.

    That's odd. I'm not at that computer now but the other day I created a classroom with two desk/chair objects and then created instances from them. I was disappointed to see that when I moved one, all the instances moved (or rotated). I'd like to know how to avoid that because I abandoned that scene.

    That should not happen. How did you create the instances? With Create New Node Instance, Create New Node Instances, or UltraScatter or ?

  • f7eerf7eer Posts: 123

    Good news! The change log for the Private Build Channel 4.12.1.41

    http://docs.daz3d.com/doku.php/public/software/dazstudio/4/change_log#private_build_channel

    shows this:

    Made a change to the timing of when a previous frame's render context is released/garbage collected/etc; this is an attempt to reduce resource consumption and address premature fallback from (NVIDIA) GPU to CPU when rendering with NVIDIA Iray

    Hopeful news! Thanks!

  • Saxa -- SDSaxa -- SD Posts: 872
    edited December 2019
    RayDAnt said:

    Would you have a value-range for what the possible "overhead memory increase" would be compared to pre-RTX Iray builds? My impression is it's not insignificant. Like 10-20% more or maybe more.  Again basing this "impression" on one test had mentionned earlier, which may be quite unfair with one case test, and there may be other unaccounted for variables too.

    More precisely it is scene content abundancy and individual-scene-object-placement dependent, since those are the two biggest factors in determining how complex/memory consuming the creation and storage of a scene's BVH Acceleration Structure is.

    Could you clarify what you mean by "individual-scene-object-placement dependent".  Trying to understand if there is a best practices way of laying out the scene for more optimal IRAY-processing of scene. 

    One guess about that is: what I have seen is that

    >less light/more deep shadows=magnitudes slower render vs.

    >very bright scene=must faster render. 

    Would guess this is one of the major factors, if not the greatest factor?  Posting this based on observations to date.  So far tend to avoid anything but brighter scenes if scene is heavier.   Would that also affect "BVH Acceleration Structure Build" size as opposed to just raycasting speed?

    Post edited by Saxa -- SD on
  • marblemarble Posts: 7,500
    edited December 2019
    barbult said:
    marble said:
    marble said:
     

    I agree - the recommendation is implicit: just buy our 24GB GPU. That is clearly the way they would like you to go. I tried instancing and it turns out that if I rotate one, I rotate them all. So the applications for instancing are quite limited and certainly not a universal panacea. You could also spend a lot of time reducing textures - more reduction for objects further away from the camera, for example. But this takes time when render times are already long - though still a fraction of CPU render times. I have 8GB and am really struggling with 4.12 to avoid the dreaded CPU fallback. I looked at the price of a 2080ti and it is way out of reach for me. Ideally, I'd like to use RTX on a 2070 Super but that appears to be a complete waste of money if it cannot cope with these VRAM requirements.

    The more I look at this, the more I'm convinced that further participation in this hobby is beyond my budget. Perhaps some combination of DAZ Studio with Unity/Unreal/Blender does offer an alternative.

    You should be able to rotate and otherwise transform instances, you just can't make bone-level, shape, or material changes.

    That's odd. I'm not at that computer now but the other day I created a classroom with two desk/chair objects and then created instances from them. I was disappointed to see that when I moved one, all the instances moved (or rotated). I'd like to know how to avoid that because I abandoned that scene.

    That should not happen. How did you create the instances? With Create New Node Instance, Create New Node Instances, or UltraScatter or ?

    OK - I just loaded that scene again. It has a desk and chair with the chair being a child of the desk. I created instances from that by, I think, Create New Node Instances (certainly not UltraScatter which I don't have).

    Now I can see what is happening and confusing the issue when considering what you and Richard are telling me. It turns out that I can indeed move the desk instances independently. However, if I move the chair of the original desk, all the chairs f the instances move likewise. Conversely, I can't move a chair of the instance - in fact I can't select the chair of the instance.

    Post edited by marble on
  • barbultbarbult Posts: 24,241
    marble said:
    barbult said:
    marble said:
    marble said:
     

    I agree - the recommendation is implicit: just buy our 24GB GPU. That is clearly the way they would like you to go. I tried instancing and it turns out that if I rotate one, I rotate them all. So the applications for instancing are quite limited and certainly not a universal panacea. You could also spend a lot of time reducing textures - more reduction for objects further away from the camera, for example. But this takes time when render times are already long - though still a fraction of CPU render times. I have 8GB and am really struggling with 4.12 to avoid the dreaded CPU fallback. I looked at the price of a 2080ti and it is way out of reach for me. Ideally, I'd like to use RTX on a 2070 Super but that appears to be a complete waste of money if it cannot cope with these VRAM requirements.

    The more I look at this, the more I'm convinced that further participation in this hobby is beyond my budget. Perhaps some combination of DAZ Studio with Unity/Unreal/Blender does offer an alternative.

    You should be able to rotate and otherwise transform instances, you just can't make bone-level, shape, or material changes.

    That's odd. I'm not at that computer now but the other day I created a classroom with two desk/chair objects and then created instances from them. I was disappointed to see that when I moved one, all the instances moved (or rotated). I'd like to know how to avoid that because I abandoned that scene.

    That should not happen. How did you create the instances? With Create New Node Instance, Create New Node Instances, or UltraScatter or ?

    OK - I just loaded that scene again. It has a desk and chair with the chair being a child of the desk. I created instances from that by, I think, Create New Node Instances (certainly not UltraScatter which I don't have).

    Now I can see what is happening and confusing the issue when considering what you and Richard are telling me. It turns out that I can indeed move the desk instances independently. However, if I move the chair of the original desk, all the chairs f the instances move likewise. Conversely, I can't move a chair of the instance - in fact I can't select the chair of the instance.

    That seems kind of strange and unexpected to me. I'll have to try this and see if the same thing happens to me.

Sign In or Register to comment.