Download

Swarm nodes keep restarting with a complex scene

I have a fairly large scene for which I am trying to build the lightmass. When rendering on production, my swarm nodes seem to get into a cycle of restarting, and they never actually help with the build. I attached an image…at any point before they can reach the ‘processing mappings’ phase, they will start over with the ‘lightmass starting’ phase. They do work when I change the lighting quality to preview, but they continuously fail and restart on production. I’m also including a snippet from the log at the bottom of this post from one of the nodes. It seems like the connection just dropped without reason and it just gave up.

These are purpose-built render nodes. They have 64gb of ram and they don’t even come close to reaching that limit. I’m wondering if anyone else has had similar problems and if there is something I’m missing or a setting I can tweak to prevent this from happening.

Thanks in advance,
-Ben


[RemoteInterfaceMonitorThreadProc] Pinging bensmachine at 10.101.9.35 ...
[RemoteInterfaceMonitorThreadProc] Testing connection on bensmachine ...
[RemoteInterfaceMonitorThreadProc] bensmachine is still alive
[RemoteInterfaceMonitorThreadProc] Pinging bensmachine at 10.101.9.35 ...
[RemoteInterfaceMonitorThreadProc] Testing connection on bensmachine ...
[RemoteInterfaceMonitorThreadProc] bensmachine is still alive
[RemoteInterfaceMonitorThreadProc] Pinging bensmachine at 10.101.9.35 ...
Radiosity Iterations 1322.5s with 1744.8Mb of cached data
[RemoteInterfaceMonitorThreadProc] bensmachine has dropped, cleaning up
[SignalConnectionDropped] Connection dropped for bensmachine
[RemoteInterfaceMonitorThreadProc] Monitor thread exiting bensmachine ...
[MaintainConnections] Detected dropped remote connection, cleaning up (2422B2BC)
[CloseConnection] Closing connection 2422B2BC using handle 2422B2BC
[CloseConnection] Connection confirmed for disconnection 2422B2BC
[FlushMessageQueue] Draining message queue for 2422B2BC
[FlushMessageQueue] Lock acquired for 2422B2BC
[FlushMessageQueue] Lock released for 2422B2BC
[FlushMessageQueue] Drain complete for 2422B2BC
[FlushMessageQueue] Draining message queue for 2422B2BC
[FlushMessageQueue] Lock acquired for 2422B2BC
[FlushMessageQueue] Lock released for 2422B2BC
[FlushMessageQueue] Drain complete for 2422B2BC
[FlushMessageQueue] Draining message queue for 2422B2BC
[FlushMessageQueue] Lock acquired for 2422B2BC
[FlushMessageQueue] Lock released for 2422B2BC
[FlushMessageQueue] Drain complete for 2422B2BC
[CloseConnection] Orphaning local connection (2422B2BC is the parent of 2422B2BD)
[CloseConnection] Closing bi-directional remote connection (2422B2BC)
[CloseConnection] Bi-directional remote connection closed (2422B2BC)
[FlushMessageQueue] Draining message queue for 2422B2BC
[FlushMessageQueue] Lock acquired for 2422B2BC
[FlushMessageQueue] Lock released for 2422B2BC
[CloseConnection] Connection disconnected 2422B2BC
[FlushMessageQueue] Drain complete for 2422B2BC
[Maintain Jobs] Waiting for Job to quit before killing: "563E4F80-4137ED1C-75B399AA-4E88C314"
[Maintain Jobs] Killing rogue Job "563E4F80-4137ED1C-75B399AA-4E88C314"
[MaintainConnections] Detected dropped local connection, cleaning up (2422B2BD)
[MaintainConnections] Remote connection has closed (2422B2BC)
[CloseConnection] Closing connection 2422B2BD using handle 2422B2BD
[CloseConnection] Connection confirmed for disconnection 2422B2BD
[FlushMessageQueue] Draining message queue for 2422B2BD
[FlushMessageQueue] Lock acquired for 2422B2BD
[FlushMessageQueue] Lock released for 2422B2BD
[FlushMessageQueue] Drain complete for 2422B2BD
[FlushMessageQueue] Draining message queue for 2422B2BD
[FlushMessageQueue] Lock acquired for 2422B2BD
[FlushMessageQueue] Lock released for 2422B2BD
[FlushMessageQueue] Drain complete for 2422B2BD
[FlushMessageQueue] Draining message queue for 2422B2BD
[FlushMessageQueue] Lock acquired for 2422B2BD
[FlushMessageQueue] Lock released for 2422B2BD
[CloseConnection] Connection disconnected 2422B2BD
[FlushMessageQueue] Drain complete for 2422B2BD
[MaintainConnections] Removed connection 2422B2BC
[GetMessage] Safely returning to 2422B2BD with no message
[MaintainConnections] Local connection has closed (2422B2BD)
[MaintainConnections] Removed connection 2422B2BD
[MaintainConnections] All connections have closed