I will check out that paper.
For the imitation learning: I found out that the culprit was the “Reset Agents” function call. Leaving that one out prevents the freeze and the imitation learning works now: