Running into a python cleanup problem on Learning agents after migrating project to new system.

After running into issues with Learning Agents on a system with a rtx 50 series gpu, I decided to migrate the project to a weaker system using a default compatible 3060. I got the project to run and training starts but after roughly the 2nd iteration the training fails, as far as I can tell because of a python cleanup error. Can anyone point me into the right direction to resolve this?
Thanks in advance.

Log Error:

LogLearning: Display: Pushing Experience…
LogLearning: Display: Subprocess: INFO:
LogLearning: Display: Subprocess: Iter: 0 | Avg Rew: 0.00092 | Avg Rew Sum: 1.54185 | Avg Return: -0.14915 | Avg Episode Len: 1587.30159 | Batch Size: 1
LogLearning: Display: Subprocess: INFO: Profile| Logging 3ms GPU Usage| Allocated: 0.02 GiB, Reserved: 0.84 GiB
LogLearning: Display: Subprocess: INFO:
LogLearning: Display: Subprocess: Episode Num: 50 | Total Step Num: 100000
LogLearning: Display: Subprocess: INFO: Profile| Pull Experience 133352ms GPU Usage| Allocated: 0.02 GiB, Reserved: 0.84 GiB
LogLearning: Display: Subprocess: INFO: Done!
LogLearning: Display: Subprocess: INFO: Exiting…
LogLearning: Display: Subprocess: Exception ignored in: <function SharedMemory._del_ at 0x0000023C01061260>
LogLearning: Display: Subprocess: Traceback (most recent call last):
LogLearning: Display: Subprocess: File “C:\Program Files\Epic Games\UE_5.7\Engine\Plugins\Experimental\LearningAgents\Content\Python\learning_agents\communicators\shared_memory.py”, line 193, in _del_
LogLearning: Display: Subprocess: File “C:\Program Files\Epic Games\UE_5.7\Engine\Plugins\Experimental\LearningAgents\Content\Python\learning_agents\communicators\shared_memory.py”, line 236, in close
LogLearning: Display: Subprocess: BufferError: cannot close exported pointers exist
LogLearning: Display: Subprocess: Exception ignored in: <function SharedMemory._del_ at 0x0000023C01061260>
LogLearning: Display: Subprocess: Traceback (most recent call last):
LogLearning: Display: Subprocess: File “C:\Program Files\Epic Games\UE_5.7\Engine\Plugins\Experimental\LearningAgents\Content\Python\learning_agents\communicators\shared_memory.py”, line 193, in _del_
LogLearning: Display: Subprocess: File “C:\Program Files\Epic Games\UE_5.7\Engine\Plugins\Experimental\LearningAgents\Content\Python\learning_agents\communicators\shared_memory.py”, line 236, in close
LogLearning: Display: Subprocess: BufferError: cannot close exported pointers exist
LogLearning: Display: Subprocess: Exception ignored in: <function SharedMemory._del_ at 0x0000023C01061260>
LogLearning: Display: Subprocess: Traceback (most recent call last):
LogLearning: Display: Subprocess: File “C:\Program Files\Epic Games\UE_5.7\Engine\Plugins\Experimental\LearningAgents\Content\Python\learning_agents\communicators\shared_memory.py”, line 193, in _del_
LogLearning: Display: Subprocess: File “C:\Program Files\Epic Games\UE_5.7\Engine\Plugins\Experimental\LearningAgents\Content\Python\learning_agents\communicators\shared_memory.py”, line 236, in close
LogLearning: Display: Subprocess: BufferError: cannot close exported pointers exist
LogLearning: Display: PPOTrainer_0: Trainer completed training.
LogLearning: Display: PPOTrainer_0: Sending configs…
LogLearning: Display: Wrote Config Files to ../../../../../../Projects/Unreal/BAUnreal_Racing/Intermediate/LearningAgents/Training1/Configs. Sending Config Signal…
LogLearning: Display: Sending config signal…
LogLearning: Display: PPOTrainer_0: Sending initial policy…
LogLearning: Error: PPOTrainer_0: Error sending policy to trainer: Unexpected communication received. Check log for additional errors.
LogLearning: Error: PPOTrainer_0: Training has failed. Check log for errors.

Hello there @neoscodex!

Checking with my peers, the fact that the process is able to complete one iteration, then fails, suggests your setup is working as intended, but something’s bringing the whole thing down on the second iteration. From your log, the issue takes places when UE exchanges data with the Python trainer, with the following key lines:

BufferError: cannot close exported pointers exist
Error sending policy to trainer: Unexpected communication received.

That means Python still has active references to a shared memory buffer when UE tries to close or reuse it, suggesting they are out of sync, and the communication crashes. Since you migrated to a weaker GPU, the desync could be related to the slower hardware.

The first thing I would test, since this is post migration, is to clear up any existing data caches. Go to your project’s main directory, and delete folders Intermediate, Saved, and Binaries. After that, righ click your .uproject file, and select “Generate Visual Studio project files”. Once completed, re-open the project, and allow it to rebuild before testing again.

Depending on the nature of your project, another suggestion from my peers is to reduce the workload of your training, to test if the process manages to go beyond the second iteration. Either reduce the number of your active agents, or adjust your training settings to allow more time between each step.