Weve recently run into some troubles getting swarm coordinator to distribute tasks across our PCs. We had it working perfectly across 4 PCs for a few weeks, but for the last few days it will only build on the local computer, and the others stay “Available, Unassigned” for the duration of the build.
Since it was working previously, and we’ve not updated anything or changed settings (IP address is correct, blurred out intentionally in picture), we’re pretty stumped as to why this is the case. The firewall/antivirus isn’t stopping anything, we’ve tried restarting everything and pinging the coordinator/agents multiple times (responds normally), validating/cleaning cache, and even building from different computers, but the problem persists. Someone suggested we look through our saved logs but I’m not sure what to look for. The only thing that looks off to me (im not too technical) is this
Exception rethrown at [0]:
at System.Runtime.Remoting.Proxies.RealProxy.HandleReturnMessage(IMessage reqMsg, IMessage retMsg)
at System.Runtime.Remoting.Proxies.RealProxy.PrivateInvoke(MessageData& msgData, Int32 type)
at SwarmCoordinatorInterface.ISwarmCoordinator.Ping(AgentInfo UpdatedInfo)
at Agent.Agent.PingCoordinator(Boolean ForcePing)
… SwarmCoordinator failed to be initialized
… initializing local performance monitoring subsystem
… initialization successful, SwarmAgent now running
[Ping] Communication with the coordinator failed, job distribution will be disabled until the connection is established
Exception details: System.Net.Sockets.SocketException (0x80004005): A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond
At a glance it seems your set up is not communicating correctly between machines. Take a look at our documentation for Swarm Agent which will give you the technical details on the application.
Have you double-checked the section in Swarm labeled ‘AllowedRemoteAgentGroup’ and ‘AllowRemoteAgentNames’. The fields you have shown in the picture are different than what I have in my own. Have you attempted to set it up from the start?
Ah, changing the AllowedRemoteAgentGroup to “Default” seems to have done the trick! Just a bit odd as im sure it was working before with the previous settings, and that i tried doing exactly this. Oh well i wont question a positive outcome haha
What worked for me was making sure my DNS settings were set automatically. I previously had set my Internet Adapter’s DNS settings to the google settings so I could bypass the Australian internet censorship filter. It was also blocking Swarm from working properly.
I have a production PC and a small rendering/storage PC on a very simple network. Occasionally I add a small laptop to render slightly faster:
No frills. I am a network administrator and this is a two computer network. Communications are fine.
I have been using the exact same settings for my swarm coordination since Dec. 23rd.
Both PCs are using Windows 10. No specific firewalls that should/would interfere.
However, roughly one out of five times, even though all PCs are showing in swarm coordinator, the light build is only happening locally. All of the settings are exactly the same. The only “fix” is to close and re-open swarm agents on all systems. Then it works just fine.
It happened just now. Render PC and laptop ignored the job entirely. Restarted all agents - they’re immediately all picking up on the job. I also make a habit of cleaning/validating caches whenever I open the agents… So it’s possible an issue is occurring there?
Something important to point out:
Make sure that the Swarm Coordinator ISN’T running on your workstation - it needs to be running on one of the networked PCs. Your workstation should only have the Swarm Agent.
Also, the article mentions that you need to exit the agent in your workstation and let UE4 launch the swarm agent, but i found that didn’t work. So turned the swarm agent and left it on in my workstation.