Issues with UGS and Perforce

anonymous-edc · April 8, 2025, 11:57pm

One of our teams has been having some issues with UE and Perforce. Below are a few bullet points of the issues they are running into.

Can you provide some guidance when you get a chance.

TLDR

SRE hypothesis is that the p4 changes command from UGS is creating a significant base load on P4 that leads to larger operations on P4 (like creating a branch or doing an integration) resulting in P4 effectively getting DDOS and slowing down to the point where it’s unusable by the team and work is blocked

Question

Are there features of Epic’s P4 network topology/architecture that are different from ours that could potentially why this is a problem for us and not them, etc.
Can we get background insight about how Epic has their P4 network architecture designed

We know there are differences but just want to talk broadly about how they have things set up

Epic use P4 on Linux, IIRC (we are on Windows)
Epic have something about a very large P4 network (proxies? edge servers?) on AWS

Svegn2 · April 9, 2025, 1:02pm

Hi Adam,

Can you provide more details on what you are observing? UGS does query the server for changes every minutes but that should not be the problem. The number of query should be limited to a call to “p4 changes” followed by “p4 describe” for the new CLs. Are your users opening a uproject or uprojectdirs? Do they keep UGS open or they start it every morning?

Some logs from your server showing the bad conditions would also be useful.

Martin

Svegn2 · April 22, 2025, 6:07pm

We have a couple of Edge servers that are used to spread the load. They are located in geographic locations that are close to higher density of users (East\West US, Europe…) .

You should also make sure to have a recent update of the server software. The older code was making single file requests which would be detrimental when using OFPA. This problem was addressed about 2 years ago and they also updated some of the previous years versions so any latest version for the last 4-5 years should be fine. Look for job 111623 in the release notes to find a version that contains the fix for the “year” you are using. https://help.perforce.com/helix\-core/release\-notes/current/relnotes.txt

Based on personal experiences, I would recommend to monitor the RAM usage of the server when problem happens. I have seen cases where a server was slowing to a crawl because of too much paging because there was not enough RAM to fit the DBs. I’m guessing it is much better in a mostly SSD world now but I’m guessing it is still possible.

anonymous-edc · April 21, 2025, 9:23pm

Hi Martin,

I appolgize I was out of the office on PTO. Let me get some more details for you.

Best,

-adam

anonymous-edc · April 22, 2025, 4:29pm

Hey, yes, Adam is handing off to me (I’m the Technical Director on this project). It will take me a couple of days to gather better information for you as IT holds the keys and have high latency.

I was more “going fishing” here in terms of wanting to just broadly discuss with you what Epic’s P4 network architecture looks like. I also am skeptical of “the problem is p4 changes load” as a sufficient explanation for our server slowdowns and occasional meltdowns. The thing that would immediately be the most helpful to you is to get an understanding of what your infrastructure looks like as our server/proxy network seems to be having a very hard time scaling with team usage.

anonymous-edc · April 22, 2025, 4:29pm

Hey, yes, Adam is handing off to me (I’m the Technical Director on this project). It will take me a couple of days to gather better information for you as IT holds the keys and have high latency.

I was more “going fishing” here in terms of wanting to just broadly discuss with you what Epic’s P4 network architecture looks like. I also am skeptical of “the problem is p4 changes load” as a sufficient explanation for our server slowdowns and occasional meltdowns. The thing that would immediately be the most helpful to you is to get an understanding of what your infrastructure looks like as our server/proxy network seems to be having a very hard time scaling with team usage.