Tutorial: Learning Agents Introduction

EDIT: I must have done something wrong… All night training still not working… Now, new project followed the modified tutorial on Unreal Learning Agents Tutorial - YouTube and within 20 minutes training my drivers already have solid performance around the track :slight_smile:

Next up tweaking and making the system compatible on multiple tracks :smiley:

Old post

Hi @Deathcalibur ,

Can you elaborate on the avg. Rewards/Return/Value… For example, I’ve been training the AI for the past 6 hours already, and it’s still only doing little steering/little throttle… Not 1 corner is done succesfully… Now I will provide some logs:

LogLearning: Display: Training Process: Iter:   25116 | Avg Reward: -0.01283 | Avg Return: -3.92119 | Avg Value: -4.39131 | Avg Episode Length: 294.11765
LogLearning: Display: Training Process: Iter:   25194 | Avg Reward: 0.00827 | Avg Return: 2.44789 | Avg Value: -0.44297 | Avg Episode Length: 294.11765
LogLearning: Display: Training Process: Iter:   25272 | Avg Reward: 0.00897 | Avg Return: 2.65678 | Avg Value: 0.39569 | Avg Episode Length: 294.11765
LogLearning: Display: Training Process: Iter:   25350 | Avg Reward: 0.00890 | Avg Return: 2.59105 | Avg Value: 0.21417 | Avg Episode Length: 294.11765
LogLearning: Display: Training Process: Iter:   25428 | Avg Reward: -0.01433 | Avg Return: -2.04743 | Avg Value: -1.74980 | Avg Episode Length: 294.11765
LogLearning: Display: Training Process: Iter:   25506 | Avg Reward: 0.00873 | Avg Return: 2.59401 | Avg Value: -2.59918 | Avg Episode Length: 294.11765
LogLearning: Display: Training Process: Iter:   25584 | Avg Reward: -0.00081 | Avg Return: -0.28876 | Avg Value: -2.95976 | Avg Episode Length: 294.11765
LogLearning: Display: Training Process: Iter:   25662 | Avg Reward: -0.00957 | Avg Return: -2.95773 | Avg Value: -3.81895 | Avg Episode Length: 294.11765
LogLearning: Display: Training Process: Iter:   25740 | Avg Reward: -0.03639 | Avg Return: -10.96862 | Avg Value: -3.31776 | Avg Episode Length: 294.11765
LogLearning: Display: Training Process: Iter:   25818 | Avg Reward: -0.01724 | Avg Return: -4.92911 | Avg Value: -5.24123 | Avg Episode Length: 294.11765
LogLearning: Display: Training Process: Iter:   25896 | Avg Reward: -0.00036 | Avg Return: 0.66710 | Avg Value: -2.78856 | Avg Episode Length: 294.11765
LogLearning: Display: Training Process: Iter:   25974 | Avg Reward: -0.03991 | Avg Return: -10.99330 | Avg Value: -4.33503 | Avg Episode Length: 294.11765
LogLearning: Display: Training Process: Iter:   26052 | Avg Reward: 0.00838 | Avg Return: 2.46531 | Avg Value: -1.31321 | Avg Episode Length: 294.11765
LogLearning: Display: Training Process: Iter:   26130 | Avg Reward: 0.00379 | Avg Return: 1.09917 | Avg Value: -1.93314 | Avg Episode Length: 294.11765

Later on, I decided, maybe if I control an AI instance myself and just drive around, it’ll help the learners… Not sure if this is true? But that iterations I got the following:

[2023.10.14-19.38.27:508][387]LogLearning: Display: Training Process: Iter:       0 | Avg Reward: 0.40469 | Avg Return: 97.76918 | Avg Value: 3.02828 | Avg Episode Length: 294.11765
[2023.10.14-19.39.18:220][987]LogLearning: Display: Training Process: Iter:      78 | Avg Reward: 0.24908 | Avg Return: 73.69471 | Avg Value: 4.82161 | Avg Episode Length: 294.11765
[2023.10.14-19.40.09:019][587]LogLearning: Display: Training Process: Iter:     156 | Avg Reward: 0.21927 | Avg Return: 65.33840 | Avg Value: 7.40708 | Avg Episode Length: 294.11765

As you can see, the numbers are way higher, than what I had on “auto-pilot” after 25k iterations…

Also, in your example screenshot I only see negative numbers for Reward, Return and Value, should these be negative?

Ps. here are the paramets when I start play:

[2023.10.14-19.51.41:930][984]PIE: Play in editor total start time 0.341 seconds.
[2023.10.14-19.51.41:949][984]LogLearning: Display: BP_DrivingRLTrainer: Sending / Receiving initial policy...
[2023.10.14-19.51.42:798][984]LogLearning: Display: Training Process: {
[2023.10.14-19.51.42:798][984]LogLearning: Display: Training Process:     "TaskName": "BP_DrivingRLTrainer",
[2023.10.14-19.51.42:798][984]LogLearning: Display: Training Process:     "TrainerMethod": "PPO",
[2023.10.14-19.51.42:798][984]LogLearning: Display: Training Process:     "TrainerType": "SharedMemory",
[2023.10.14-19.51.42:798][984]LogLearning: Display: Training Process:     "TimeStamp": "2023-10-14_19-51-41",
[2023.10.14-19.51.42:798][984]LogLearning: Display: Training Process:     "SitePackagesPath": "E:/Program Files/Epic Games/UE_5.3/Engine/Plugins/Experimental/PythonFoundationPackages/Content/Python/Lib/Win64/site-packages",
[2023.10.14-19.51.42:798][984]LogLearning: Display: Training Process:     "IntermediatePath": "E:/Unreal Projects/AIRacer/Intermediate/LearningAgents",
[2023.10.14-19.51.42:798][984]LogLearning: Display: Training Process:     "PolicyGuid": "{52497813-452A-AD39-6E59-CCA26658E80A}",
[2023.10.14-19.51.42:798][984]LogLearning: Display: Training Process:     "ControlsGuid": "{0439ECD7-4DAC-1A3F-D32A-66960B28EBF1}",
[2023.10.14-19.51.42:798][984]LogLearning: Display: Training Process:     "EpisodeStartsGuid": "{316C1D66-4763-B45B-AA97-8F8FC8B0D1C4}",
[2023.10.14-19.51.42:798][984]LogLearning: Display: Training Process:     "EpisodeLengthsGuid": "{F9868096-4395-4EF1-18CF-D9A7C2C0053D}",
[2023.10.14-19.51.42:798][984]LogLearning: Display: Training Process:     "EpisodeCompletionModesGuid": "{63FFFB91-47E2-D049-E244-EC9DB7DFAD36}",
[2023.10.14-19.51.42:798][984]LogLearning: Display: Training Process:     "EpisodeFinalObservationsGuid": "{409CFBD1-4D0A-3B0E-37D7-F6BF8A87078B}",
[2023.10.14-19.51.42:798][984]LogLearning: Display: Training Process:     "ObservationsGuid": "{09393258-4195-DD95-83AC-1F8BE7355380}",
[2023.10.14-19.51.42:798][984]LogLearning: Display: Training Process:     "ActionsGuid": "{D3C32437-41ED-4EEC-4C3C-EB9A6CF53A76}",
[2023.10.14-19.51.42:798][984]LogLearning: Display: Training Process:     "RewardsGuid": "{F8C0C959-439B-398B-CD37-F7AEF93EC2C8}",
[2023.10.14-19.51.42:798][984]LogLearning: Display: Training Process:     "ObservationVectorDimensionNum": 8,
[2023.10.14-19.51.42:798][984]LogLearning: Display: Training Process:     "ActionVectorDimensionNum": 2,
[2023.10.14-19.51.42:798][984]LogLearning: Display: Training Process:     "MaxEpisodeNum": 1000,
[2023.10.14-19.51.42:798][984]LogLearning: Display: Training Process:     "MaxStepNum": 10000,
[2023.10.14-19.51.42:799][984]LogLearning: Display: Training Process:     "PolicyNetworkByteNum": 72788,
[2023.10.14-19.51.42:799][984]LogLearning: Display: Training Process:     "PolicyHiddenUnitNum": 128,
[2023.10.14-19.51.42:799][984]LogLearning: Display: Training Process:     "PolicyLayerNum": 3,
[2023.10.14-19.51.42:799][984]LogLearning: Display: Training Process:     "PolicyActivationFunction": "ELU",
[2023.10.14-19.51.42:799][984]LogLearning: Display: Training Process:     "PolicyActionNoiseMin": 0.25,
[2023.10.14-19.51.42:799][984]LogLearning: Display: Training Process:     "PolicyActionNoiseMax": 0.25,
[2023.10.14-19.51.42:799][984]LogLearning: Display: Training Process:     "CriticNetworkByteNum": 71240,
[2023.10.14-19.51.42:799][984]LogLearning: Display: Training Process:     "CriticHiddenUnitNum": 128,
[2023.10.14-19.51.42:799][984]LogLearning: Display: Training Process:     "CriticLayerNum": 3,
[2023.10.14-19.51.42:799][984]LogLearning: Display: Training Process:     "CriticActivationFunction": "ELU",
[2023.10.14-19.51.42:799][984]LogLearning: Display: Training Process:     "ProcessNum": 1,
[2023.10.14-19.51.42:799][984]LogLearning: Display: Training Process:     "IterationNum": 1000000,
[2023.10.14-19.51.42:799][984]LogLearning: Display: Training Process:     "LearningRatePolicy": 9.999999747378752e-05,
[2023.10.14-19.51.42:799][984]LogLearning: Display: Training Process:     "LearningRateCritic": 0.0010000000474974513,
[2023.10.14-19.51.42:799][984]LogLearning: Display: Training Process:     "LearningRateDecay": 0.9900000095367432,
[2023.10.14-19.51.42:799][984]LogLearning: Display: Training Process:     "WeightDecay": 0.0010000000474974513,
[2023.10.14-19.51.42:799][984]LogLearning: Display: Training Process:     "InitialActionScale": 0.10000000149011612,
[2023.10.14-19.51.42:799][984]LogLearning: Display: Training Process:     "BatchSize": 128,
[2023.10.14-19.51.42:799][984]LogLearning: Display: Training Process:     "EpsilonClip": 0.20000000298023224,
[2023.10.14-19.51.42:799][984]LogLearning: Display: Training Process:     "ActionRegularizationWeight": 0.0010000000474974513,
[2023.10.14-19.51.42:799][984]LogLearning: Display: Training Process:     "EntropyWeight": 0.009999999776482582,
[2023.10.14-19.51.42:799][984]LogLearning: Display: Training Process:     "GaeLambda": 0.8999999761581421,
[2023.10.14-19.51.42:799][984]LogLearning: Display: Training Process:     "ClipAdvantages": true,
[2023.10.14-19.51.42:799][984]LogLearning: Display: Training Process:     "AdvantageNormalization": true,
[2023.10.14-19.51.42:799][984]LogLearning: Display: Training Process:     "TrimEpisodeStartStepNum": 0,
[2023.10.14-19.51.42:799][984]LogLearning: Display: Training Process:     "TrimEpisodeEndStepNum": 0,
[2023.10.14-19.51.42:799][984]LogLearning: Display: Training Process:     "Seed": 1234,
[2023.10.14-19.51.42:799][984]LogLearning: Display: Training Process:     "DiscountFactor": 0.9900000095367432,
[2023.10.14-19.51.42:799][984]LogLearning: Display: Training Process:     "Device": "GPU",
[2023.10.14-19.51.42:799][984]LogLearning: Display: Training Process:     "UseTensorBoard": false,
[2023.10.14-19.51.42:799][984]LogLearning: Display: Training Process:     "UseInitialPolicyNetwork": true,
[2023.10.14-19.51.42:799][984]LogLearning: Display: Training Process:     "UseInitialCriticNetwork": false,
[2023.10.14-19.51.42:799][984]LogLearning: Display: Training Process:     "SynchronizeCriticNetwork": false,
[2023.10.14-19.51.42:799][984]LogLearning: Display: Training Process:     "LoggingEnabled": true
[2023.10.14-19.51.42:800][984]LogLearning: Display: Training Process: }