Network shader compiler task distribution

So I’ve been working on a rather large personal project as of late, and I’ve been attempting to set up SWARM on my home server, which is a decently powerful machine and, I got to thinking. Why not also allow shader compiling over the network? So let’s say a moderately sized team is working on a large game project, and during asset implementation phase at that. It would be very useful to have a UE4 build tool server running on a powerful computer to task and handle all shader compiling. I realize this would require a shared DDC, which is fine as most teams of this size would be using that anyway. If the compiler machine is reasonably more powerful than the editors workstation, this could drastically improve efficiency on the end of the editor. Furthermore, if a build server were set up, reflection captures, geometry and third party integrations such as Steam Audio could be built on this server machine as well, further freeing up resources on the editors machine.

I’ve run some tests using the tools available to test my theory, and while it was very tricky to pull off it did indeed prove my point to some extent. I set up two machines:

Machine 1:
Intel Core i9-9900X
16 GB RAM
RTX 2080

Machine 2:
Intel Core i9-9900X
80 GB RAM
2x Titan RTX

Machine 1 is set up as the editing workstation, and Machine 2 is the server to be handling compilation/building.

The test project was being stored on a NAS and accessed on both machines via a SMB mount; both are set up using a shared DDC which is stored on the same NAS the project is stored on.

The test goes as follows:
Machine 1 imports some textures and creates a material, then Machine 2 should compile said shaders when the apply or save button is hit or during material creation. Now I am not able to do this with the current version of UE4 so I created a sort of macro application on Machine 2 in visual studio that automatically creates a material when it detects the base color texture in the folder its looking at, adds the normal, roughness, sets the material to translucent (heaviest on shader compilation in my experience) and hits save. I timed the moments the macro was doing something and the time it took to compile the shader for 4 materials and created some materials on Machine 1 as a control.

Note that each time Machine 2 saves a material and it’s finished, it takes a few seconds to show up in the File Browser on machine 1, this may seem insignificant but adds 1 to 3 seconds per material saved with remote compilation
ALSO – after Machine 2 finishes compiling and saving materials, it can vary wildly on how long Machine 1 took to load the shaders and display the materials properly. from 5 seconds to 50 seconds per material saved with remote compilation. I am leaving these two times out of the results below as we are talking about freeing up resources and improving efficiency. The editor having to wait a few extra seconds for the mats to show properly is way better than waiting for them to compile locally and not being able to continue working during compilation.

Scenario 1
I imported 9 4K textures on machine one and Machine 2 ran it’s macro. All the while I was able to continue importing textures and doing work while Machine 2 did it’s thing. Now the feature I’m requesting the editor would still need to make the material, so I accounted for roughly a minute and a half that it would take me to create such a material and subtracted it from the result time.

[SPOILER]


S1 Results:
Importing textures -- **1 min 52 sec**
** Time it would take editor to create material: **1 min 22 sec**
shader compilation time on Machine 1: **N/A**
--Shader compilation times per material on Machine 2--
Mat 1: **1 min 20 s**
Mat 2: **1 min 15 s**
Mat 3: **1 min 16 s**
Mat 4: **57 s**

[/SPOILER]

Total time for editor user before being able to move on to more work: 3 minutes 23 seconds

*** Note this is not counting any shader compilation time, just the time it took to make all the necessary clicks and wait for the editor to catch up; shader compile time is timed separately or during any time the shader compiler process is running and when UE4 says “Compiling Shaders”*

Scenario 2
I imported 9 4K textures on machine one and created the materials myself, allowing the shaders to compile on my editing workstation.
[SPOILER]


S2 Results:
Importing textures -- **1 min 59 sec**
** Time took to create materials: **3 min 14 sec**
Shader compilation time on Machine 2: **N/A**
-- Shader compilation times per material on Machine 1 --
Mat 1: **1 min 30 s**
Mat 2:  **1 min 28 s**
Mat 3:  **1 min 28 s**
Mat 4:  **1 min 26 s**

[/SPOILER]

Total time for editor user before being able to move on to more work: 11 minutes 8 sec

Now, as the requested feature would obviously not be automatically creating materials, let’s replace the time it took to create materials in Scenario 1 with Scenario 2s time it took to create materials. This will tell us roughly how long it would take an average editor to import some textures, create a few materials and save said materials before being able to move on to more work.

RESULTS:
[SPOILER]Here we will see a very, VERY, VERY rough estimation of how long it would take to do the actions stated above but have a second machine handling the heavy lifting (compiling shaders)

Total time to import textures, create materials and move on while Machine 2 compiles shaders: 5 min 1 sec

VS

Total time to import textures, create materials and WAIT while current machine compiles shaders: 11 min 8 sec[/SPOILER]

You can see the stark difference there. Again, this is a super rough estimation, and I didn’t have much time to do more testing and validate my results, but I have a pretty good feeling they would be near the same. I’m also still rather fuzzy on how the shared DDC system actually functions as a whole due to the lackluster documentation in that area. It’s not hard to set up, but understanding it is a bit fuzzy. Does it share everything from the local DDC? But I’m getting off track.
Now this test has been primarily focused on shader compilation during material creation (when hitting apply or save) – this will also significantly boost editor performance though when the editor is compiling shaders in the background, such as when migrating a large amount of assets from one project to another or while editing the actual material itself. Shader compiling can hog up resources, so during this process it can make the editor almost unusable. Having it offloaded to another machine will free up all of these resources.

Anyway, I hope you guys think about this as I do believe it would increase editor efficiency among development teams.

PS: Does anyone know why sometimes compiling shaders sometimes goes into the negative? For instance it’ll start saying “Compiling Shaders 500” then count down to “Compiling Shaders -201”

This already exists, called XGE shader compiling. Unfortunately it requires incredibuild to be installed and properly licensed.

Ah I didn’t know someone already made it. From what it looks like, Incredibuild appears to be a cloud based service? I may be wrong though. Their website is really fuzzy on the details, and just “accelerate builds” is pretty much all it explains.

I sincerely hope Epic takes this into consideration, it could vastly improve efficiency and performance for small to huge teams. It could easily be set up so the server machine doesn’t even have to run the UE4 editor. Just make a coordinator program similar to SWARMCoordinator that handles distribution of tasks and a basic GUI for both machines to handle config and monitor performance/progress. Heck they could even integrate the functionality into the SWARM system. Being able to import a large amount of textures and breeze through making materials, hitting save on each one and not having to wait anywhere from 40 seconds to up to 2 minutes while it compiles before moving on to the next task is a dream.

Just make your material, hit save, and the server compiles the shader OR the coordinator distributes the task evenly between multiple machines.
One thing I hate is when I upgrade to a new version of Unreal Engine OR if the DDC is purged then having to wait ages while it recompiles upwards of 100k to 400k shaders depending on the size of the project. Even with a Intel Core i9-9900X and it’s 10 cores, on a moderately sized project, this can take quite a while and hog up every ounce of computing power available. Locking me out of even doing the most mundane tasks in Windows, and setting the priority to super low or locking it out of using specific cores is just hindering the process and extending the time it’ll take to compile.
In those events it’d be absolutely lovely if a network PC or server picked up the slack and I was able to dive into working on blueprint or integration, or heck, even watch some YouTube while the network computer compiled, saving the local computers resources.

Not sure about cloud. Anyway, it would be the most efficient via LAN since products of code/shader compilation are huge, gigabytes of data.

Yeah, their page for incredibuild is a bit vague. And yeah, this would really only be possible over LAN. I mean I could see a AAA studio using a cloud service for it if both ends had internet at at least 10 gbps.

Anyway, buying new Ryzen CPUs for shader artists could a better investment than Incredibuild. Their pricing per core is seriously insane.
Especially that Ryzen 3900X (12-core, 24-Thread) costs only $499. And 16-core Ryzen is coming in September.

Intel i9-9900X is quite ineffective nowadays - only 10 cores. Threadripper 2950x with 16 cores is cheaper. Although motherboards for Threadrippers are quite expensive.
Anyway, even if you’d stay with Intel, it could help to add more RAM. I listened to advise of putting 1GB per CPU core and this works beautifully with code/shader compilation.

The most effective now would be:

  • set up shared shader location on the server
  • seriously, sell Intel machine and build AMD-based workstation with a lot of RAM

This will also greatly speed up compiling engine and cooking game later on.

After switching old Core i7 + 16 GB RAM to Threadripper 2950x + 32 GB RAM cooking is up to 6x faster.
Windows and apps stay responsive even if rebuilding engine with 32 threads, 100% utilization.

The last thing, I noticed that rebuilding DDC from scratch is twice faster on M.2 SSD than SATA SSD (if project and cache are located on the same drive).

I think you’ve completely missed the point here. I didn’t come here for advice on which CPU I should purchase on my next upgrade lol … I came in to suggest a feature that would be very useful for development teams.
Both my desktop and server work perfectly fine for all of the tasks they are assigned.
Being able to offload shader compiling onto a central server so the editor can instantly continue working after material creation or what have you is a major benefit in my eyes.

But anyway, I’m fine with the i9-9900X for now, I built these machines in January so I’ve got a ways to go before its time to upgrade. And if AMD is still in the lead when upgrade time rolls around for me in a year or two I’d definitely go with them. But regardless of that situation, I’m perfectly happy with my systems as they are. Compared to my old i7-4770k w/ 32 GB of RAM my machine blows it out of the water in terms of speed when compiling UE4 or compiling shaders.

Edit: wait a minute, I just noticed… Did you just recommend someone get 1 GB of RAM per CPU core? That’s a bit silly. The amount of RAM one should put in their workstation is entirely dependent on the workload they intend on with that machine. You should always give yourself a little extra RAM headroom as well. Saying “oh if you have a 4 core CPU, 4 GB if RAM is perfectly adequate” is just silly! And by that logic both my server and gaming desktop have more than enough RAM for the job. The i9-9900X has 10 cores and 20 threads. I will assume you meant cores when you said 1 GB per core. That would give my gaming desktop 1.6 GB of ram per physical core. Now my server on the other hand, which I generally use for working in Unreal Engine has 80 GB of ram; That’s 8 GB per physical core. Of course I wouldn’t measure the RAM needed per CPU core as again… Silly. My server RAM needs were carefully calculated based on all workloads I had planned for it. Same with my gaming machine. I don’t need 300 GB of RAM to compile shaders. You completely missed the point here, which is to have another machine compile shaders for you. I never said I was going to invest into incredibuild. I have no need.

IncrediBuild is LAN-based not cloud. Distributed shader compilation is one of their key features, the only one really that is worth the silly pricing they have. This means that we won’t see native support for this feature though.

Per thread. And yes, while compiling engine entire 32 GB RAM is well utilized in case of 32 thread Threadripper. If not, there’s memory left to use other system apps while - saving you precious time.
Also, such amount of memory helps if you’re using that machine for crunching lot of code/data: compiling engine/shaders, cooking, validating data, batch fixing redirectors (loading lot of assets into memory).

There’s excellent post on building a workstation for UE4 by Claybook’s engineer.

Obviously, this was just a suggestion what could make your workflow faster now.
I have nothing to add to your request. Or nothing against it. Just let’s be realistic here. Epic either choose to use Incredibuild for shared shader compiling or investing a few bucks more into workstations.
Although they greatly improved shader compilation in the latest release - up to 3x faster thanks to better memory alignment of processed data. Perhaps they would implement the solution you requested… Well, I simply doubt they gonna do it this year, or 2020.
Notice they almost never implement things just requested on community forums. Even if they would, it’s because it’s a common issue. Not because they’d read this thread.

Thus I simply assume it would be helpful to suggest folks on the forum “what they could if Epic won’t do anything with their request”. I didn’t address just you. I expect many people could read this and spend money on more efficient Ryzen/Threadripper CPUs while upgrading their rigs.

You’re right saying “dure, you’re missing the point here”. Although I can assure you’d save much more time by upgrading your rig again instead of relying on Epic.

What do you mean we won’t see native support? You mean for IncrediBuilds version of this feature?
I would hope Epic would at least consider adding a feature like this to the engine, seeing as how they already have distributed light map building I don’t really see why this would be such a stretch. They could even utilize the current SWARM tools for distributing the tasks and it would allow teams to use the hardware they already have coughDoctorErgotcough, save money and increase efficiency.

Alright, I get that, but the problem is, most teams and studios are going to use the hardware they have available. The cost of upgrading every workstation with new motherboards & CPUs would very quickly outweigh the benefits. Not to mention, companies still heavily rely on Intel, like it or not. That may change soon enough with AMDs latest offerings, but for the time being Intel is still king of the hill in the business world.
The same goes for me, tell me why I should scrap the several thousand dollars in hardware I have right now and go out and drop more money on a new system for minimal gains.It’s not like I am using a 10 year old CPU. My rigs are barely 7 months old and can very easily hold their own, especially up against a TR 2950X which only shows a marginal 5% gain in benchmarks.

That link you posted is INSANELY biased towards AMD. They put in benchmarks for older gen Intel CPUs sporting 4 and 8 cores up against AMD HEDT CPUs two generations ahead, which is absolutely unfair.

You do you pal. I can see you really like AMD, that’s cool. For me, I don’t care what logo is on my CPU as long as it gets the job done, and if what I have works perfectly fine, then I am not going to go out and wastemoney on upgrades I don’t need, and I know most people would feel the same way when their current systems work fine. In this case upgrading isn’t the solution or the point at all. Point being that having this feature regardless, would increase efficiency with minimal to nil effort on the users end. It took me 5 minutes to set up SWARM on both computers, how is that not easier than upgrading your system and STILL having to locally compile shaders? No matter how fast your CPU is, you still gotta wait while those shaders compile when you save a material. The entire point of this is to make it so the editor doesn’t have to wait. Just hit save, move on to the next task.

I like 16 cores more than 10 cores. Not companies. And Threadripper is my first AMD product ever, hardly a fanboy.
Comparison is fair, they compared current CPU to previous one. It’s a gamedev studio, not review office. Any generation of Intel Core i7 would be weak in comparison to 32 thread CPU supporting quad channel memory. That’s it.

In case of playing video games and other apps not utilizing all cores.
Unreal Build Tool and shader compiler utilizes all available cores beautifully. It’s not physically possible that Threadripper 2950x would be only 5% faster, if system is properly set up (i.e. enough RAM).
Switching from 4 to 16 cores increased speed of heavy UE4 operations 3-6 times for me. Engine compiles 4x faster.

This is example of proper comparison.
http://hwbench.com/cpus/intel-core-i…adripper-2950x

12-core Ryzen 3900x has almost the same price as 8-core i9-9900K. And half of 10-core i9-9900X price.
16-core 3950X will make i9 series even more ancient. You could actually get some money back if you’d sell your current CPU…

Nope, Intel it’s not a king anymore if we’re talking about HEDT or server CPUs. Lot of people buy Intel CPUs for workstation because they used to buy Intel.
Just compare prices, man.

Seemed pretty clear to me they meant in practical terms based on what the most common hardware is, not what is the best value to buy right now.

@rdtg In terms of native engine support, I guess this just falls into the category of things that Epic don’t have any real internal use for - they use Incredibuild, and support for that is built into UBT. So they don’t have much incentive to reimplement it.
​​​
With the editor blocking when saving a material, I’d assume that is probably a separate issue to where the compilation is being done? UE4 has what seems to me a pretty major issue with the way so many core asset types can’t exist in memory without their render resources first being available. Hence the UI blocking on things like mesh building and texture compression the first time assets are loaded. My guess is it’s a similar thing that causes the block when saving a material, and fixing that synchronous requirement would be independent of offloading the processing elsewhere.

I’d rather Epic find a way to not require all 200+ shader variants for a single material to compiled before the material can be displayed at all. Queueing them up for compilation on-demand or move variants requested for the current frame to the top of the queue would speed up iteration by a lot.

2 Likes

I’d imagine this speeding up time wasted waiting for shader compiling by 50x or more. There are so many unnecessary shaders compiled that are never used, it’s ridiculous.

This just got merged to 4.32 branch: Checked in 20% Shader compiler speedup from Fortnite branch
GitHub link

Did some research on this topic recently.

It is possible to have a cheap distributed network shader compilation support, and it is NOT that difficult from a programmer’s perspective, so my guess is that Epic doesn’t have any incentive to do that since they already have IB support built-in.

You can implement distributed shader compilation using fastbuild, FASTBuild - High-Performance Build System, but you have to integrate some code and recompile UE4 by yourself, there is no official support, so you have to have some programming skills.

The code needed is here: https://github.com/fastbuild/fastbuild/issues/539

With the code there I have successfully implemented distributed shader compilation using 2 machines, with some modifications for 4.23.1.