swarm installation

thanks - if you look at my last post, I think those variables are set correctly now, at least the SwarmCoordinator shows me what looks like functional communication. Both of my machines are in a group called WORKGROUP, and this is specified for both AgentGrooupName and AllowedRemoteAgentGroup. Each machine also has the Other machine specified as AllowedRemoteAgentNames.

My breakthrough came after changing AgentGrooupName and AllowedRemoteAgentGroup from default to WORKGROUP, which is what my windows group is set to.

Once I get it working between two machines, I will add a third, also in group WORKGROUP, by adding its name to AllowedRemoteAgentNames after a comma.

Right?

Yes, it should work, but you might as well use ‘*’.

Nevertheless are you sure the level you’re testing is producing enough work for distribution?

Thanks,

Obviously I am no expert, but it takes up more than the 16GB RAM my local machine has, and pushes everything else, like the OS so far into swap that the machine is unusable for 20 minutes while it bakes, and the CPU usage is really low most of the time, because the data it needs is not in RAM but in swap. This kind of thing tells me it is way over taxing this machine. I even have AvoidLocalExecution turned on.

I did some major streamlining of my scene last night which seemed to have helped. I learned, to my surprise, that objects in the scene which are hidden still affect bakes. I had a lot of hidden items in my scene to make it simpler to test, but when I deleted them instead of hiding them it got way faster and uses way less RAM. And, gives better looking results. This is very counterintuitive to VFX people like me used to DCC apps which allow objects to be in the scene but not rendered.

TBH I’m not sure if AvoidLocalExecution still works. I’ll verify it and get back to you.

  • Some progress, a new error message -

My local machine seems to be unable to ping itself (it is the coordinator)

Here is the log -

9:32:55 AM: [Job] PID is 25072
9:32:55 AM: [Job] GUID is “0483C3CA-4FD249D0-0C1FB399-E72F869E”
9:32:56 AM: LogLightmass:Display: Lightmass Win64 started on: ALEXANDER. Command-line: 0483C3CA4FD249D00C1FB399E72F869E -numthreads 10
9:32:56 AM: Lightmass Win64 started on: ALEXANDER. Command-line: 0483C3CA4FD249D00C1FB399E72F869E -numthreads 10
9:32:56 AM: LogLightmass:Display: Processing scene GUID: 0483C3CA4FD249D00C1FB399E72F869E with 10 threads
9:32:56 AM: Processing scene GUID: 0483C3CA4FD249D00C1FB399E72F869E with 10 threads
9:32:56 AM: Building static lighting…
9:32:56 AM: [Job] Found a parent connection for PID 25072
9:32:56 AM: [Job] 4F737602 -> 6D735095
9:32:56 AM: [Interface:TryOpenConnection] Local connection established
9:32:57 AM: Measured CPU frequency: 3.34 GHz
9:32:57 AM: FStaticLightingSystem started using GKDOPMaxTrisPerLeaf: 4
9:32:57 AM: Number of texture mappings: 51
9:32:57 AM: Number of fluid mappings: 0
9:32:57 AM: Number of landscape mappings: 0
9:32:57 AM: Number of BSP mappings: 0
9:32:57 AM: Number of static mesh instance mappings: 51
9:32:57 AM: Reserving memory for 51 meshes, 35986 vertices, 39874 triangles
9:32:57 AM: Scene surface area calculated at 46.295 million units (45.638% of the estimated 101.440 million units)
9:32:57 AM: Importance volume surface area calculated at 46.140 million units (70.582% of the estimated 65.371 million units)
9:32:57 AM: Preallocated 0.0Gb for kDOP nodes and triangles
9:32:57 AM: Building kDOP took 0.02 seconds.
9:32:57 AM: Static lighting kDOP: 11894 nodes, 5948 leaves, 23792 triangles, 24730 vertices
9:32:57 AM: Static lighting kDOP: 22.751% wasted space in leaves
9:32:57 AM: kDopTree.Nodes : 1.3Mb
9:32:57 AM: kDopTree.SOATriangles : 1.6Mb
9:32:57 AM: kDOPTriangles : 0.0Mb
9:32:57 AM: TrianglePayloads : 0.6Mb
9:32:57 AM: MeshInfos : 0.0Mb
9:32:57 AM: Vertices : 0.5Mb
9:32:57 AM: UVs : 0.3Mb
9:32:57 AM: LightmapUVs : 0.3Mb
9:32:57 AM: Static lighting kDOP: 11894 nodes, 5948 leaves, 23792 triangles, 24730 vertices, 4.6 Mb
9:32:57 AM: Processing…
9:32:57 AM: EmitDirectPhotons complete, 0.029 million photons emitted in 0.1 seconds
9:32:58 AM: EmitIndirectPhotons complete, 0.221 million photons emitted in 0.2 seconds
9:32:58 AM: Marking Irradiance Photons complete, 0.015 million photons marked in 0.1 seconds
9:33:03 AM: [Ping] Communication with the coordinator failed, job distribution will be disabled until the connection is established
9:33:46 AM: Caching Irradiance Photons complete, 5.638 million cache samples in 48.7 seconds
9:33:50 AM: Calculate Irradiance Photons complete, 0.014 million irradiance calculations in 3.5 seconds
9:42:27 AM: Lighting 18.6%
9:42:30 AM: Lighting 20.9%
9:42:53 AM: Lighting 30.2%
9:49:34 AM: Lighting 40.1%
9:51:58 AM: Lighting 54.0%
9:54:39 AM: Lighting 60.3%
10:05:44 AM: Lighting 74.4%
10:05:44 AM: Lighting 83.7%
10:07:36 AM: [Job] Job is a success!
10:07:36 AM: Lighting 97.7%
10:07:55 AM: [Network] Pinging remote agents…
10:07:55 AM: [Network] Remote Agent ping complete
10:08:02 AM: [Network] Pinging Coordinator…
10:08:02 AM: [Network] Coordinator has failed to respond
10:08:02 AM: [Network] Coordinator ping complete

This seems like some sort of IP settings / DNS issue which is a bit over my head, as an artist not an IT person.

Any hints?

Success at last. I set the DNS to Automatic on both machines and it started working.

It is not really solving the main problem yet, which is that when I put the settings to the quality I want, it gobbles up more RAM than I have locally, and completely kills my machine to the point that not even the mouse functions, or any other programs, so I have to wait until a build finishes (~30 minutes) to do anything like check this page or check my email.

Is there a way to limit the amount of local RAM used, and push most of the work to the remote machines?

Yes my set up was not functional until I replace Default and Default with Homegroup