Hello, so we have our dedicated servers running in a cluster in the cloud, and we would like to be able to use Unreal Insights to profile their performance. What we currently have:
The Unreal trace server running on it’s own instance
Servers running, using “-tracehost=” to point to the trace server
Network load balancer to connect to the trace server from our studio
My questions are:
Is this the correct setup? If not what should I change?
What are the ports 1981, 1985 and 1989 used for specifically? The docs are a bit vague about it.
Does the Insights tool need to connect to the game server AND the trace server?
Yes, using -tracehost= and a running UnrealTraceServer is a valid use case. You can then connect directly with UnrealInsights to respective UnrealTraceServer and browse the available utrace sessions.
Alternative is to capture the trace directly to a local file using -tracefile=<path_to_local_utrace_file> instead of -tracehost. But in that case you need another way to access the respective utrace file from each server.
The ports used by TraceLog system, UnrealTraceServer (UTS) and Unreal Insights app:
1981 is the default TCP port where UTS listens for the incoming trace connections. Trace data is send from UE runtime to UTS on this TCP connection.
It can be changed by running UTS with --recport=<recorder_port>, but in that case the runtime emitting trace would need to also use the new port.
1989 is the default TCP port used by UTS for trace store communication (ex: Unreal Insights uses the socket on this port to browse the list of available trace sessions and get/set settings for UTS).
It can be changed by running UTS with --port=<port>. Unreal Insights would also need to have correctly this port specified (**UnreaInsights.exe -Store=<host:port>), if default is changed. By default, Unreal Insights Frontend is looking for UTS on localhost:1989.
1985 is the TCP port used by the TraceLog system (i.e. UE runtime on server, game client, editor, etc.) to listen for trace related commands (“SendTo”, “Stop” and “ToggleChannels”). See Engine\Source\Runtime\TraceLog\Private\Trace\Control.cpp. Unreal Insights uses this for establishing connections to running instances (ex. the “Connect” functionality from the “Connection” tab in Unreal Insights Frontend). This port is not used by UTS.
The UE runtime (game server, client, editor, etc.), the Unreal Trace Server (UTS) and the Unreal Insights can each run on different machines.
If you have a utrace file, Unreal Insights can open it directly (i.e. no connection to UTS or game server is needed at all).
Unreal Insights (Frontend) usually needs to connect to UTS (port 1989) to get the list of available trace sessions. Further, Unreal Insights can stream directly from UTS a trace to analyze.
Unreal Insights (Frontend) would need to connect to game server only if you want to use the “late connect” functionality (i.e. the “Connection” tab / port 1985), i.e. not using -tracehost at all and only start tracing late when manually invoked from Unreal Insights UI.
Note that Unreal Insights is not needed at all in the process of recording a trace. It is only needed as a way to visually browse available trace sessions managed by UTS and/or to analyze a certain trace session (either streamed from UTS or directly from a file).
So I can telnet into the server on port 1989, but still can’t connect with Insights. I don’t see anything in the log output, maybe I can make it verbose?
Ok, I figured out the issue. The container was running, but would crash quietly when then the server connected to it. Turns out I wasn’t creating a home directory for the `unreal` user because you usually don’t, but UTS needs that home directory to store trace files. My working updated Dockerfile
It would be really cool to be able to pass in the trace file directory as a param so I could use a separate S3 mount to store the trace files, but that’s a nice to have and not blocking.
Ah I see what is happening. The store interface provides a way to read traces from the host. This is used when analyzing a trace file and in the session browser which tries to read the from each stored trace file until it finds the session info event.
The way this is implemented in UTS is that a “relay” connection is created. That is, a new listen port is opened on an ephemeral port (32768–60999) and the resulting port is sent back to Insights which tries to transfer the file from a new connection on that port.
Ephemeral ports are not automatically mapped to the outside of the container.
I’m not 100% sure about why this design was chosen, but I’ll ask around. I don’t think it’s needed for multiple connections from the same client for example.
I don’t know if it’s possible to map that entire range, or what side effects that would have. It doesn’t sound like an ideal solution. Another option could be if we provide a configuration option so that you have a known range of ports to map.
Yeah, that’s a tough one on my end, also can’t easily open up a port range to the cluster. In a perfect world we could set which port to use in settings or from a cli param…
A configurable range should be possible, but requires some work on our end. The concern is that a fixed port range requires active management of the connections in case the server receives more download requests than the map range covers. There would have to be some coordination so that there is no chance of “mixing up” connections.
One version of UTS lives in Engine\Source\Programs\UnrealTraceServer. We’re in the process of moving the source to GitHub, so unfortunately it’s not the very latest version, but it should be good enough to test with.
UTS uses Xmake, a different build system since we wanted support statically linked cstd libraries on Linux. Simple explanation how to build is available in Readme.md.
The connections in question are created in `FStoreCborPeer::OnTraceRead`. It creates a `FTraceRelay` instance which calls `StartServer` in it’s constructor. Notice that it doesn’t set a port (e.g. it will be 0), which will delegate to the OS to find a port.
Here one would need to device some scheme where you roll the port number in a given range (which you can map in the container). The issue of course is to never “overlap” connections.