What would be necessary to get a network queue render rig?

Kitsumo · November 2018

kenshaw011267 said:

I think the telling part is the memory usage. DS has a well known memory leak during renders. Had you run several renders in a row without closing DS before you ran this test? That would perfectly explain the results you display.

No, for most of those I closed DS and reopened the scene (not every single time though). From what I've read on the forums, DS normally uses 2-3 times as much RAM as the GPUs use VRAM. For my card, I really should have 24-32 Gb of system RAM(https://www.daz3d.com/forums/discussion/267526/system-ram-requirement-for-a-1080ti). Depending on how this weekend goes, I might do some more testing and try to be a little more scientific.

An interesting product showed up in the store today. The promo pictures have a lot of information I've been looking for. I still can't figure out how he can see how much memory is being used by textures vs geometry, or whether compression is being used. I don't know if that's info that I can access by myself or if I have to buy the product. I'll check it out more later. For now, it's off to work.

Edit: I found entries for geometry & textures in the log, so I guess I'll try using that.

anonbach said:

Whoa guys didn't get a chance to come back to see how it's going. Interesting.

Yeah, sorry. I didn't mean to get so far off topic. It just sort of happened. Did you ever decide what to do with your system?

anonbach · November 2018

Yeah. Don't have a problem with this going off discussion. As I said, interesting.

Well, I'm going to go with what I planned. 2 x 2080Ti and 1 x 2070 (would have gone another 2080Ti but only have a 1000W PSU and PCIe 3.0 x4 limits level of GPU I can put into the last slot.)

Should be interesting. I'm going to have to learn how to optimisze scene so I can actually render more than a few seconds footage without waiting for hours and hours (even with that config).

I could go HEDT and get more lanes, but that would be even worse price to perf since I don't plan on using the CPU to render at all and don't want to pay the inflated prices for 32GB+ DDR4 memory (need quad channel).

Otherwise as long as it is a modern-ish Core i7 CPU, should be fine as I'm offloading all the real work to the GPUs, right?

Kitsumo · November 2018

anonbach said:

Yeah. Don't have a problem with this going off discussion. As I said, interesting.

Well, I'm going to go with what I planned. 2 x 2080Ti and 1 x 2070 (would have gone another 2080Ti but only have a 1000W PSU and PCIe 3.0 x4 limits level of GPU I can put into the last slot.)

Should be interesting. I'm going to have to learn how to optimisze scene so I can actually render more than a few seconds footage without waiting for hours and hours (even with that config).

I could go HEDT and get more lanes, but that would be even worse price to perf since I don't plan on using the CPU to render at all and don't want to pay the inflated prices for 32GB+ DDR4 memory (need quad channel).

Otherwise as long as it is a modern-ish Core i7 CPU, should be fine as I'm offloading all the real work to the GPUs, right?

Yes, your CPU should be fine. Generally it's best to only select the GPUs for rendering, the CPU will have plenty of work sending and recieving data from all the GPUs. But, if you want to be sure, just run a benchmark with the CPU and one without, and see which way is faster.

I wouldn't worry too much about PCIE bus speeds. Here's a guy that did a test on bus speed https://direct.daz3d.com/forums/discussion/175226/effects-of-pci-e-bandwidth-on-load-and-render-times. Overall, it may take a few seconds longer to initialize, but the speed boost of having an extra card more than makes up for it. Heck, even in my setup that I linked earlier, the cards took about 2 seconds longer to initialize, but the render finished 22 seconds faster overall. Whatever you decide to do, good luck. It sounds like you're on the right track.

DrNewcenstein · November 2018

You could get an external GPU cluster which only takes up one PCI lane in your PC, such as the one from Amfeltec. I have one (well, had, I think I baked it), and when it worked, it was noticeably faster. I was shooting for 4 1080tis on it but it died before I could get the GPUs together.

Only thing I don't like about it is that it can't report to the PC as one gigantic GPU, but reads as (up to) 4 separate GPUs, and either Windows or my MOBO has trouble with more than 5 (the Titan Z reports as 2 GPUs, since it's basically 2 GPUs on one card), so I had that one/ those 2, 3 other GPUs on the cluster (980 and 2 780 tis), and a Titan X (Pascal) in the machine. I had to butcher the Device Manager tree to get them all working - killing USB ports and the like to free up resources.

I need to get a few things squared away before replacing the cluster and adding more 1080 tis, but I'm hoping the RTX cards have VRAM pooling via NVLink. If so, I'll head that way instead, but I'll probably need to get a more robust MOOB than the Z87 Pro and i7-4770K I'm using now - I still don't know if resource consumption is a MOBO thing or a Windows thing. I'm also seeing bits and bobs about RTX cards requiring Win10, and I have not heard one good thing about Win10.

However, there are also mining rig setups that should interface similarly to the Amfeltec cluster, and I'm seeing those in 6+ GPU configs - basically looking like Nvidia VCAs on a budget. However, the question still remains as to whwther they report to the PC as one giant GPU or not, because Windows needs to be able to detect them all to be useful in Iray.

Kitsumo · November 2018

DrNewcenstein said:

You could get an external GPU cluster which only takes up one PCI lane in your PC, such as the one from Amfeltec. I have one (well, had, I think I baked it), and when it worked, it was noticeably faster. I was shooting for 4 1080tis on it but it died before I could get the GPUs together.

Only thing I don't like about it is that it can't report to the PC as one gigantic GPU, but reads as (up to) 4 separate GPUs, and either Windows or my MOBO has trouble with more than 5 (the Titan Z reports as 2 GPUs, since it's basically 2 GPUs on one card), so I had that one/ those 2, 3 other GPUs on the cluster (980 and 2 780 tis), and a Titan X (Pascal) in the machine. I had to butcher the Device Manager tree to get them all working - killing USB ports and the like to free up resources.

I need to get a few things squared away before replacing the cluster and adding more 1080 tis, but I'm hoping the RTX cards have VRAM pooling via NVLink. If so, I'll head that way instead, but I'll probably need to get a more robust MOOB than the Z87 Pro and i7-4770K I'm using now - I still don't know if resource consumption is a MOBO thing or a Windows thing. I'm also seeing bits and bobs about RTX cards requiring Win10, and I have not heard one good thing about Win10.

However, there are also mining rig setups that should interface similarly to the Amfeltec cluster, and I'm seeing those in 6+ GPU configs - basically looking like Nvidia VCAs on a budget. However, the question still remains as to whwther they report to the PC as one giant GPU or not, because Windows needs to be able to detect them all to be useful in Iray.

The Amfeltec idea seems pretty cool, although the tower/cluster looks like it'd be easy to knock over or spill something on by accident. I can tell you that the mining rigs, at least the one I have report as separate GPUs. I don't think there is a way to have multiple GPUs report as one, unless it's done by the manufacturer.

As far as device resources, I think it varies by motherboard. Mine has 2 PCIE slots, 1 x16 and 1 x4, but to get my extra cards working I had to reduce the speeds to free up resources. But on the other hand, if I had money to burn, there are gaming motherboards with 4 PCIE x16 slots.

I think the best bang for the buck is to use mining hardware. I built my setup for less than $50 ($30 for PCIE risers, $8 for milk crate, ~$10 for extra hardware) not including the power supply. Of course building a high end setup would mean using more heat resistant materials and a better power supply.

DrNewcenstein · November 2018

The pictures don't give a sense of scale, but the Amfeltec frame is about 12"-14" high and about 8" x 8" square, and it's pretty stable. I had it sitting on top of a small bass combo amp next to the computer shelf.

But then, the way the cards are oriented on it, you'd need a custom SLI/Link cable that was flipped at one end, so even if NVLink did work, it wouldn't work, unless the cable was made to match the configuration.

My MOBO has 4 PCI slots, but the only way to get 4 GPUs into it is to use single-width/skinny Quadros like the K4000. There's no way to get 4 *useful* GPUs into it.

I was also able to get an Akitio Thunderbolt box hooked up to one slot, even though it wasn't specifically for GPUs, however it's too small to put a real GPU like the 1080ti, much less a Titan Z (3-wide), so I had a Titan Z with this bare board stuck on it standing on its top propped up by a CD spindle on one side and the PSU for the Titan Z on the other, with the foil bag it came in as insulation.

It's been said that PCI bus speed isn't as critical with rendering, and even with all my slots cut to x4, I was still getting decent render times. However, since there was no way for force my PCI bus speed to change with all the GPUs attached, I can't really prove that bus speed isn't a factor.

Kitsumo · November 2018

DrNewcenstein said:

The pictures don't give a sense of scale, but the Amfeltec frame is about 12"-14" high and about 8" x 8" square, and it's pretty stable. I had it sitting on top of a small bass combo amp next to the computer shelf.

But then, the way the cards are oriented on it, you'd need a custom SLI/Link cable that was flipped at one end, so even if NVLink did work, it wouldn't work, unless the cable was made to match the configuration.

My MOBO has 4 PCI slots, but the only way to get 4 GPUs into it is to use single-width/skinny Quadros like the K4000. There's no way to get 4 *useful* GPUs into it.

I was also able to get an Akitio Thunderbolt box hooked up to one slot, even though it wasn't specifically for GPUs, however it's too small to put a real GPU like the 1080ti, much less a Titan Z (3-wide), so I had a Titan Z with this bare board stuck on it standing on its top propped up by a CD spindle on one side and the PSU for the Titan Z on the other, with the foil bag it came in as insulation.

It's been said that PCI bus speed isn't as critical with rendering, and even with all my slots cut to x4, I was still getting decent render times. However, since there was no way for force my PCI bus speed to change with all the GPUs attached, I can't really prove that bus speed isn't a factor.

What's preventing you from changing it? I know every motherboard is different, mine has settings in the BIOS. The only drawback to mine is that (as far as I can tell) I can only set one speed for all slots; I can't have 1 at x8 and 3 at x2 or something, there's just one global setting. I don't think PCIE bus speed is a huge drawback. In the testing I did with my setup, the biggest slowdown I had was about 4 seconds in initialization time, but the render still finished 22 seconds faster because of the extra card.

I was going to do more testing with texture compression and VRAM usage this weekend, but I think I've beaten that horse beyond death. Time to move on to something else.

Notifications

What would be necessary to get a network queue render rig?

Comments

Adding to Cart…