Possible bug in Learning Agents ImitationTrainer

mirrorclimb · September 15, 2023, 6:51pm

Its entirely possible I am making a mistake in my code, but I believe the the Imitation Learning may not be properly applying the Policy settings, specifically related to the size and noise settings of the network to be trained. I believe that it is using the default settings regardless of what settings you input.

As a minimal example to reproduce, please try the following.

Setup for a imitation learning session with the policy settings using values like so that are different from the default values (in particular this has 4 layers and Action Noise Min of 0.1). Use a fresh Neural Network asset by clicking Reset Network on the data asset.

Run training like so

The following values are output to the log, not that the PolicyLayerNum is 3 and the PolicyActionNoiseMin is 0.1 which does not match the settings that we provided in step 1

LogLearning: Display: Training Process:     "ObservationVectorDimensionNum": 37,
LogLearning: Display: Training Process:     "ActionVectorDimensionNum": 2,
LogLearning: Display: Training Process:     "MaxSampleNum": 6001,
LogLearning: Display: Training Process:     "PolicyNetworkByteNum": 87636,
LogLearning: Display: Training Process:     "PolicyHiddenUnitNum": 128,
LogLearning: Display: Training Process:     "PolicyLayerNum": 3,
LogLearning: Display: Training Process:     "PolicyActivationFunction": "ELU",
LogLearning: Display: Training Process:     "PolicyActionNoiseMin": 0.25,
LogLearning: Display: Training Process:     "PolicyActionNoiseMax": 0.25,
LogLearning: Display: Training Process:     "IterationNum": 100000,
LogLearning: Display: Training Process:     "LearningRateActor": 9.999999747378752e-05,
LogLearning: Display: Training Process:     "LearningRateDecay": 0.9900000095367432,
LogLearning: Display: Training Process:     "WeightDecay": 0.0010000000474974513,
LogLearning: Display: Training Process:     "BatchSize": 128,
LogLearning: Display: Training Process:     "Seed": 1234,

Furthermore, if you then try to either run inference on the trained network or further train it via reinforcement learning with the same policy settings as shown in step 1 you will get an error like so

LogLearning: Warning: BP_FighterLearningPolicy: Neural Network Asset settings don't match those given by PolicySettings

If you try to run inference on the trained network or further train it via reinforcement learning with the same policy settings that got logged (3 layers, 0.1 min noise) then the error goes away.

TLDR: I think that a 3 layer network with 128 UnitNum is trained no matter what settings you pass to SetupPolicy in step 1, only in Imitation Training, in Reinforcement Learning everything works fine. If this is just user error on my part I apologize.

Deathcalibur · September 18, 2023, 4:13pm

I’ve been playing around with the 5.3 release version and I think you are correct and this is a bug. I think when you call ImitationTrainer->RunTraining with ReinitPolicy=True it accidentally gets the default policy back instead of the appropriate sized one.

You may be able to work around this by making a dummy network with the RL Trainer, like setup RL, Reinit = true, and then stop training right away. Then in the imitation trainer, you can use that network with Reinit = false. I seemed to have gotten an uncaught editor crash trying this (which is very unfortunate), but I hacked it together really quickly so I may have messed something up.

I can look at the source code from the 5.3 release and send you instructions on how to patch this if you are interested, otherwise this is already somewhat fixed on UE Main branch but that’s because we reworked the python interop.

My intention is to work on the imitation learning tutorial for the driving demo next week, so I may have more info once I get that setup properly.

system · October 18, 2023, 4:13pm

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.