I can add 2 more things:
- This does not happen every cook
- It does not seem to be random. This happens more frequently the closer to max RAM usage of the system we are. We are watching the usage, adjusted core number and set up the config entries, but it seems like we need around 12 - 16 GB free RAM for the issue to not show up. We’re cooking on 128GB RAM machine, we’ll be upgrading it because the gains are insane once correctly set up. We shaved off 1h15 - 1h20min off of a 1h40min cook. So the issue goes away when we leave some room for breathing.
My best guess so far is that .NET Host needs some RAM and if we use most of it for cooking (so we set up the values to stay at 90-95% usage of RAM) then it just stops working correctly or in reasonable time. But I did not have time to debug it further. I’ll try to gather the insights file for you, just so the feature has some data to develop onto, in case this can be worked with on your side.
Below is a screen from a 4-core cook on my dev machine (not build farm). There’s no Google Chrome / Slack on build machine, so the memory targets are closer to max. As you may see there are 2 issues still to work out:
- That .NET Host scales with the amount of work the cooker does. It consistently eats up more RAM the more work happens. As the build farm is set up to use 9 cores, the usage goes higher, which is what I base my guess on, that this is the process that needs memory to not block cooking. Something from the cooker uses that .NET host, because it goes away completely when the cook ends.
- Microsoft Defender attempts to catch up with the files loaded from drive by all the cook processes. The more processes I use the more CPU power is taken by Defender. This slows cooking. I had major gains when I killed defender, but that’s far from optimal.
[Image Removed]