Did you consider using a behavior tree combined with learning agents?
Consider using the navmesh and normal AI navigation until you reach some goal where you need to use the ML and flip your neural network on?
This is my 2 cents without doing a deep dive into your project.
Thanks!