So I am attempting to use an EC2 instance on Amazon Web Services to add some processing power to my LightMass builds. This was referenced here, among other places. Like the OP on that page, I have the issue that my local machine doesn’t have the chops to actually do the lighting build I’m trying to run by itself (I left it running all day and after 8 hours it had not even completely built the scene and had not even started on the actual lighting). To be fair, I am building an 8km² densely foliated and highly detailed outdoor level.
I was able to set up an EC2 instance, move the DotNET folder over and run SwarmAgent on the server. I was also able to get the SwarmAgent instances on both my local and the remote machine to show up in the coordinator. I made sure both machines were in the same group, and both had “*” for the allowed remote agents name.
However, when I kick off a light build ONLY the local agent was used. This was true even when I set “avoid local execution” to True and ran the remote agent on an instance that had significantly more processors and memory than my local machine.
To make sure it wasn’t merely impatience, I started a new project with a smaller level thinking that perhaps it was just that if it had time to do the full export to Swarm it might use the remote machine; however even in this case the light build used ONLY my local machine. I had swarm coordinator running on my local machine and swarm agent running on both machines when I kicked off the build.
Both machines can ping eachother and both machines have ports 8008 and 8009 open to both TCP and UDP traffic. At one point I even had both machines connected by OpenVPN and able to ping eachother by hostname.
So this leaves me with several questions I need answered.
First, it was mentioned in another thread on lightmass that you can break up your lighting builds into smaller chunks. How would I do that?
Second, how do I get the AWS cloud servers to participate in the lightmass build? What files does the remote machine NEED to run the build?
What determines whether or not a remote agent participates in a lightmass build?
I am running into the same problem. All of the machines are able to ping each other, each machine shows up in the Swarm Coordinator so they are all talking to each other but when I kick off a build, even with ‘avoid local execution’ set to true, it only uses the current machine. I am running Networx on the master machine and I can see the outgoing data spike as soon as I do this so it looks like it’s sending out the data but the other machines sit idle doing nothing. Help would be greatly appreciated here
I’m pretty sure the problem here is that when the coordinator sends back an IP for the machine that will be doing the work (one of the other agents) it sends back the “private ip” for your EC2 instance. Go into your swam agent on your local machine and up the log level to something like extra verbose. You should see a line in there that says something like “Trying to open a remote connection to …”. There will be an IP address in that line, if you try to ping it you won’t be able to because it’s internal to Amazon. Compare that to your private IP address on your EC2 instance, it should be the same.
That’s pretty helpful. Do you know of any workarounds to that issue?
I just set up a VPN.
- create a new AWS instance for your VPN (I used Ubuntu)
- connect your computer to the VPN
- connect your lighting build server(s) to the VPN
- connect your coordinator to the VPN
That’s really about it. You need the servers that will be running the swarm agents to have the right stuff installed but other than that you should be good to go. I was able to get it working.
You can go another route with this by setting up a VPC that allows hardware access but I didn’t see any reason to do that for what our team needs. You still need to setup the VPN anyway
Hello, curious is you ever got this to work? I would also like to know the specs or at least the number of cores you had on your EC2 instance? I have found that when connecting a computer through swarm agent over LAN, if the coordinator detects the slightest amount of CPU usage from the client, it will not be used in the build. I would also like to know if you were using the public IPs of your local and remote machines or the private one prior to using a VPN Connection?
Yea, got this working and it works pretty well. We just use what we can get on the spot market, normally something in the c4 range. We build lighting once or twice a month and only 1 of us builds it so it’s not a huge deal for us to have to go get an instance.
In terms of swarm thinking your computers are being used, just adjust the settings of the agent. In one of the drop downs it says something about developer options or config or something like that. Change the tolerance in there. I set mine up to run no matter what the computer(s) are doing since they are only for lighting
Is there a doccumentation on how to setup the AWS from the beginning? I want to use it for lightmass build.
Not that I’ve been able to find; my purpose in starting this thread was to try to build such a documentation.
So far it comes to:
Create AWS compute instances to run the build
Install Lightmass on those instances
Create a VPN service instance in AWS
Join that VPN from your computer and from each of your build instances
I tried using AWS but failed miserably. First I wanted to have everything running on the VM, because having Lightmass communicate over the Internet can be problematic. But I couldn’t run UE4 as it gave an error, saying that DirectX 11 level 10 is required. I tried to install DX in every way I found on Google but had no luck. The VM was running Windows Server 2016.
Then I tried the above mentioned way, running only a Swarm client, but unfortunately Windows Server doesn’t have dotNet 3.5 and installing it was very problematic. 90% of the online solutions I found was requiring the install DVD, which is obviously not available with a cloud VM. Not sure if that or something else was the reason, but modifying the settings in the Swarm client gave errors and ultimately wasn’t able to connect to the Coordinator. Too bad, I would have been really interested to see how this machine performs, it was a g3.4xlarge with 16 virtual cores.
I think that there might be an issue there with using the right kind of EC2 instance …the fact that you were running on a version of Windows Server screams to me “not really designed for graphics-intensive applications!”.
You’d need to make sure your EC2 instance meets UE4’s system requirements…
Any chance we could trouble you to create some good documentation/tutorials, RuBa?
The installation media needs to be created from a snapshot and attached to the EC2. See instructions here: Add Windows components using installation media - Amazon Elastic Compute Cloud
That helps a little, I think, but still doesn’t bring us there. I am thinking I would like to create - since no one else seems to have done so - a detailed, step-by-step how-to of setting up AWS clusters for doing UE4 lightmass builds. There’s still more questions that need to be answered to get that to work, though!
Honestly your time would be better spent setting up the GPU lightmass baker and working with that. A single high end gpu will outperform 100 cpu instances without breaking a sweat, it’s crazy
Do you have a link on how to do that…?