Automated Server Performance & Load Testing

I’m looking for best practices on automated testing for game server process performance.

What I want to test:

  • Resource utilization such as cpu, memory, network, etc. per connected client.
  • Resource utilization as it changes across a number of variables such as number of clients, reconnects, uptime, game activities, etc.
  • Optimized number of server processes per “physical server” such as various AWS EC2 instance types
  • Uptime performance

My naive approach would be something like:

  • Develop a scriptable “bot” client that can connect to the server and perform some number of actions, such as login, reconnect, load maps, basic interaction with game actors to load objects into memory, etc.
  • Using a CI/CD tool like Jenkins, spin up the game server process to test as well as the required number of clients, running on separate VMs
  • The clients run through some scripted test plans / player flows
  • Collect metrics from the server process and VMs

My questions and concerns are:

Is this a reasonable approach? Are there best practices within the Unreal developer community about testing server performance?

Are there any best practices to writing such test bots? Is there any decent documentation on this approach? Many of the gameplay frameworks seem to make this particularly difficult. For example, I can’t use the AI system as that runs on the server. Similarly, the automated test functionality doesn’t really accomplish what I want to do as I want to simulate a production environment and to test realistic load conditions. Do I really need to write a client (largely) from scratch?

There seem to be a number of tools, such as profilers, for testing client performance, but little to nothing for the server-side. Is this correct? I can, of course, use general process inspection metrics, but I’m bit surprised I haven’t found more.

There are several old unanswered questions about this in the AnswerHub and forums. It seems clear to me this is a problem many in the community have faced, but I can’t find documented solutions. Is it really that difficult? Or is it so blindly simple that I’m just missing the “click here to run performance tests” button?

Thank you for any advice!

Meant to add this link, as this is perhaps the best documentation I’ve found so far:

Turns out there’s a great discussion on this topic on UDN for those that have access: