Update:
I ended up customizing the plugin code.
I have added an event delegate that can be hooked to from blueprints to get to know when a new policy was received. This enables me to have a “generation” counter to see how many times the model was trained.
I have also added the possibility to enable/disable agents. This way agents can be excluded from the training process when needed, for example when they die. This way they do not lose their episode buffers/gathered experience, as they would if I would temporarily remove them while being dead. I do not want agents to participate in training while they are dead, as regardless of any observation or action they take, none of them would have any effect and training a model on dead agents is not meaningful. I simply enable the actors agent again once they respawn.
If these additions would be useful for the main branch, I am more than happy to contribute to the Github repository of the plugin. Please let me know if contributing is wanted/helpful!