Hello, I have some questions regarding the changes made in this commit to enable tracing for parallel polling in Iris: https://github.com/EpicGames/UnrealEngine/commit/da1a30a3f74782a7c3d6d998ae19ee87a0fad3d8\#diff\-144c2e5a19eb9944eb081dc4525f4b2185aad18bfe7b08ecb3a7085cc91da1d9
I recently modified our engine and Iris to execute the client connection tick phase (`STAT_NetDriver_TickClientConnections`) in parallel.
I’m happy to say that we’ve been running our servers with these modifications without issue for about 2 months now.
We run automated scale tests every day and 2-3 large scale playtests a week so I’m fairly comfortable in saying it’s free of any obvious bugs.
However, we do have an issue related to traces.
Roughly 2 in 5 traces run into an issue when loading the file into Insights which causes Insights to freeze for several minutes before opening the trace.
This freezing is caused by an extreme number of log statements being flushed to the program log file. The log in question being:
UE_LOG(LogNetTrace, Warning, TEXT("PacketContentEvent GameInstanceId: %u, ConnectionId: %u %s, Missing NameIndex: %llu"), (uint32)GameInstanceId, (uint32)ConnectionId, ConnectionMode ? TEXT("Incoming") : TEXT("Outgoing"), DecodedNameOrObjectId);
The log files produced when opening such a trace ends up around 3GB in size for about 1 minute of trace data.
If I disable this log line, Insights is able to open the file without issue but due to the log triggering in the first place I fear there may be missing Net Trace data.
Realistically, if we wanted to get reliable packet info then we can just disable our parallel ticking when we want an accurate trace of the net data, but this isn’t ideal.
I’m in the process of testing multiple things, but I believe the main issue to be the use of `NoSync` for the Net Trace Name Event:
// Trace a name
UE_TRACE_EVENT_BEGIN(NetTrace, NameEvent, NoSync)
UE_TRACE_EVENT_FIELD(uint32, NameId)
UE_TRACE_EVENT_FIELD(UE::Trace::AnsiString, Name)
UE_TRACE_EVENT_END()
If I understand correctly, Insights is unable to locate the Name ID for the Packet Content Event, which could be due to the Name Event not being recorded reliably.
I will be running several scale tests to check if the issue still occurs after removing the flag to try and verify this.
However, before removing the `NoSync` flag from the event I would like to ask for information regarding the bug mentioned in the original commit.
It mentions that NoSync was added to deal with a bug in the tracing code itself, but I couldn’t find any information about such a bug elsewhere.