Its entirely possible I am making a mistake in my code, but I believe the the Imitation Learning may not be properly applying the Policy settings, specifically related to the size and noise settings of the network to be trained. I believe that it is using the default settings regardless of what settings you input.
As a minimal example to reproduce, please try the following.
- Setup for a imitation learning session with the policy settings using values like so that are different from the default values (in particular this has 4 layers and Action Noise Min of 0.1). Use a fresh Neural Network asset by clicking Reset Network on the data asset.
- Run training like so
The following values are output to the log, not that the PolicyLayerNum is 3 and the PolicyActionNoiseMin is 0.1 which does not match the settings that we provided in step 1
LogLearning: Display: Training Process: "ObservationVectorDimensionNum": 37,
LogLearning: Display: Training Process: "ActionVectorDimensionNum": 2,
LogLearning: Display: Training Process: "MaxSampleNum": 6001,
LogLearning: Display: Training Process: "PolicyNetworkByteNum": 87636,
LogLearning: Display: Training Process: "PolicyHiddenUnitNum": 128,
LogLearning: Display: Training Process: "PolicyLayerNum": 3,
LogLearning: Display: Training Process: "PolicyActivationFunction": "ELU",
LogLearning: Display: Training Process: "PolicyActionNoiseMin": 0.25,
LogLearning: Display: Training Process: "PolicyActionNoiseMax": 0.25,
LogLearning: Display: Training Process: "IterationNum": 100000,
LogLearning: Display: Training Process: "LearningRateActor": 9.999999747378752e-05,
LogLearning: Display: Training Process: "LearningRateDecay": 0.9900000095367432,
LogLearning: Display: Training Process: "WeightDecay": 0.0010000000474974513,
LogLearning: Display: Training Process: "BatchSize": 128,
LogLearning: Display: Training Process: "Seed": 1234,
Furthermore, if you then try to either run inference on the trained network or further train it via reinforcement learning with the same policy settings as shown in step 1 you will get an error like so
LogLearning: Warning: BP_FighterLearningPolicy: Neural Network Asset settings don't match those given by PolicySettings
If you try to run inference on the trained network or further train it via reinforcement learning with the same policy settings that got logged (3 layers, 0.1 min noise) then the error goes away.
TLDR: I think that a 3 layer network with 128 UnitNum is trained no matter what settings you pass to SetupPolicy in step 1, only in Imitation Training, in Reinforcement Learning everything works fine. If this is just user error on my part I apologize.