Tutorial: Learning Agents Introduction

Thanks!!

1 Like

Hey everyone,

We have finished a quick tutorial that walks you through reinforcement learning with Learning Agents. Please check it out here:

Feel free to post comments for anything that needs clarification! This tutorial was tested with the UE 5.3 Preview 1.

Thanks,
Brendan

16 Likes

Let’s gooo! Congrats Brendan! I’m checking it out now. Impressive communication through the development process… it’s refreshing to hear updates as they become available.

1 Like

Hey Brendan, I see you mentioned that algorithm currently supports primarily only continuous actions and does not support discrete action. I can see ways to sort of make it work right now but is there any timeframe when we can expect for proper discrete actions support?

We’ll definitely get it in for the next release of Unreal.

You can do discrete actions if you treat the output of the network as probabilities and then do like a sampling from that. But when I say “we don’t support discrete actions” I mainly mean that our training algorithm doesn’t support multiple loss functions and in this case doing something like cross entropy loss.

Is there any plan for a C++ tutorial? Feel like the setup would be a lot clearer in code. Thanks!

1 Like

Hello. Did everything just like in tutorial but can’t get in working. The network ignores input data . Tried everything. Can someone help me?

What are you seeing in the Output log? Need more information to know what is wrong

Hi!
In view of recent Epic news and layoffs, will Learning Agents be impacted? :frowning:
Thanks

1 Like

No one working on Learning Agents has been laid off, nor has our focus changed.

3 Likes

Great News Thanks!

Sure , here are the are the logs :
LogLearning: Display: car_meneger_C_UAID_3C7C3F2B6046E7AC01_1426584452: Adding Agent SportsCar_Pawn_C_UAID_3C7C3F2B6046EEAC01_1261901689 with id 0.
LogLearning: Display: car_meneger_C_UAID_3C7C3F2B6046E7AC01_1426584452: Adding Agent SportsCar_Pawn_C_UAID_3C7C3F2B6046EEAC01_1261905690 with id 1.
LogLearning: Display: car_meneger_C_UAID_3C7C3F2B6046E7AC01_1426584452: Adding Agent SportsCar_Pawn_C_UAID_3C7C3F2B6046EEAC01_1261897688 with id 2.
LogLearning: Display: car_meneger_C_UAID_3C7C3F2B6046E7AC01_1426584452: Adding Agent SportsCar_Pawn_C_UAID_3C7C3F2B6046EEAC01_1261892687 with id 3.
LogLearning: Display: car_meneger_C_UAID_3C7C3F2B6046E7AC01_1426584452: Adding Agent SportsCar_Pawn_C_UAID_3C7C3F2B6046EEAC01_1254470679 with id 4.
LogLearning: Display: car_meneger_C_UAID_3C7C3F2B6046E7AC01_1426584452: Adding Agent SportsCar_Pawn_C_UAID_3C7C3F2B6046F5AC01_2072549918 with id 5.
LogLearning: Display: car_meneger_C_UAID_3C7C3F2B6046E7AC01_1426584452: Adding Agent SportsCar_Pawn_C_UAID_3C7C3F2B6046F5AC01_2072524912 with id 6.
LogLearning: Display: car_meneger_C_UAID_3C7C3F2B6046E7AC01_1426584452: Adding Agent SportsCar_Pawn_C_UAID_3C7C3F2B6046E7AC01_1399514451 with id 7.
LogLearning: Display: car_meneger_C_UAID_3C7C3F2B6046E7AC01_1426584452: Adding Agent SportsCar_Pawn_C_UAID_3C7C3F2B6046F5AC01_2072555920 with id 8.
LogLearning: Display: car_meneger_C_UAID_3C7C3F2B6046E7AC01_1426584452: Adding Agent SportsCar_Pawn_C_UAID_3C7C3F2B6046E7AC01_1398330450 with id 9.
LogLearning: Display: car_meneger_C_UAID_3C7C3F2B6046E7AC01_1426584452: Adding Agent SportsCar_Pawn_C_UAID_3C7C3F2B6046F5AC01_2072575926 with id 10.
LogLearning: Display: car_meneger_C_UAID_3C7C3F2B6046E7AC01_1426584452: Adding Agent SportsCar_Pawn_C_UAID_3C7C3F2B6046F5AC01_2072531914 with id 11.
LogLearning: Display: car_meneger_C_UAID_3C7C3F2B6046E7AC01_1426584452: Adding Agent SportsCar_Pawn_C_UAID_3C7C3F2B6046F5AC01_2072528913 with id 12.
LogLearning: Display: car_meneger_C_UAID_3C7C3F2B6046E7AC01_1426584452: Adding Agent SportsCar_Pawn_C_UAID_3C7C3F2B6046E7AC01_1384625448 with id 13.
LogLearning: Display: car_meneger_C_UAID_3C7C3F2B6046E7AC01_1426584452: Adding Agent SportsCar_Pawn_C_UAID_3C7C3F2B6046F5AC01_2072541916 with id 14.
LogLearning: Display: car_meneger_C_UAID_3C7C3F2B6046E7AC01_1426584452: Adding Agent SportsCar_Pawn_C_UAID_3C7C3F2B6046EEAC01_1258820684 with id 15.
LogLearning: Display: car_meneger_C_UAID_3C7C3F2B6046E7AC01_1426584452: Adding Agent SportsCar_Pawn_C_UAID_3C7C3F2B6046EEAC01_1254476680 with id 16.
LogLearning: Display: car_meneger_C_UAID_3C7C3F2B6046E7AC01_1426584452: Adding Agent SportsCar_Pawn_C_UAID_3C7C3F2B6046F5AC01_2072572925 with id 17.
LogLearning: Display: car_meneger_C_UAID_3C7C3F2B6046E7AC01_1426584452: Adding Agent SportsCar_Pawn_C_UAID_3C7C3F2B6046EEAC01_1258828686 with id 18.
LogLearning: Display: car_meneger_C_UAID_3C7C3F2B6046E7AC01_1426584452: Adding Agent SportsCar_Pawn_C_UAID_3C7C3F2B6046F5AC01_2072536915 with id 19.
LogLearning: Display: car_meneger_C_UAID_3C7C3F2B6046E7AC01_1426584452: Adding Agent SportsCar_Pawn_C_UAID_3C7C3F2B6046EEAC01_1254485682 with id 20.
LogLearning: Display: car_meneger_C_UAID_3C7C3F2B6046E7AC01_1426584452: Adding Agent SportsCar_Pawn_C_UAID_3C7C3F2B6046F5AC01_2072559921 with id 21.
LogLearning: Display: car_meneger_C_UAID_3C7C3F2B6046E7AC01_1426584452: Adding Agent SportsCar_Pawn_C_UAID_3C7C3F2B6046F5AC01_2072552919 with id 22.
LogLearning: Display: car_meneger_C_UAID_3C7C3F2B6046E7AC01_1426584452: Adding Agent SportsCar_Pawn_C_UAID_3C7C3F2B6046F5AC01_2072565923 with id 23.
LogLearning: Display: car_meneger_C_UAID_3C7C3F2B6046E7AC01_1426584452: Adding Agent SportsCar_Pawn_C_UAID_3C7C3F2B6046E7AC01_1396728449 with id 24.
LogLearning: Display: car_meneger_C_UAID_3C7C3F2B6046E7AC01_1426584452: Adding Agent SportsCar_Pawn_C_UAID_3C7C3F2B6046EEAC01_1258816683 with id 25.
LogLearning: Display: car_meneger_C_UAID_3C7C3F2B6046E7AC01_1426584452: Adding Agent SportsCar_Pawn_C_UAID_3C7C3F2B6046F5AC01_2072562922 with id 26.
LogLearning: Display: car_meneger_C_UAID_3C7C3F2B6046E7AC01_1426584452: Adding Agent SportsCar_Pawn_C_UAID_3C7C3F2B6046EEAC01_1254480681 with id 27.
LogLearning: Display: car_meneger_C_UAID_3C7C3F2B6046E7AC01_1426584452: Adding Agent SportsCar_Pawn_C_UAID_3C7C3F2B6046F5AC01_2072516911 with id 28.
LogLearning: Display: car_meneger_C_UAID_3C7C3F2B6046E7AC01_1426584452: Adding Agent SportsCar_Pawn_C_UAID_3C7C3F2B6046F5AC01_2072545917 with id 29.
LogLearning: Display: car_meneger_C_UAID_3C7C3F2B6046E7AC01_1426584452: Adding Agent SportsCar_Pawn_C_UAID_3C7C3F2B6046F5AC01_2072569924 with id 30.
LogLearning: Display: car_meneger_C_UAID_3C7C3F2B6046E7AC01_1426584452: Adding Agent SportsCar_Pawn_C_UAID_3C7C3F2B6046EEAC01_1258824685 with id 31.
PIE: Server logged in
PIE: Play in editor total start time 0,425 seconds.
LogLearning: Display: car_trainer: Sending / Receiving initial policy…
LogLearning: Display: Training Process: {
LogLearning: Display: Training Process: “TaskName”: “car_trainer”,
LogLearning: Display: Training Process: “TrainerMethod”: “PPO”,
LogLearning: Display: Training Process: “TrainerType”: “SharedMemory”,
LogLearning: Display: Training Process: “TimeStamp”: “2023-09-29_17-08-26”,
LogLearning: Display: Training Process: “SitePackagesPath”: “C:/Program Files/Epic Games/UE_5.3/Engine/Plugins/Experimental/PythonFoundationPackages/Content/Python/Lib/Win64/site-packages”,
LogLearning: Display: Training Process: “IntermediatePath”: “E:/Unreal_project/AItest/Intermediate/LearningAgents”,
LogLearning: Display: Training Process: “PolicyGuid”: “{0576C3D7-4673-DA39-4A54-A7B54797F1F0}”,
LogLearning: Display: Training Process: “ControlsGuid”: “{52F6ABD3-4420-EBF5-7FC8-2F8237C18AE3}”,
LogLearning: Display: Training Process: “EpisodeStartsGuid”: “{880DA62F-49E0-2411-83E1-08B727F10A90}”,
LogLearning: Display: Training Process: “EpisodeLengthsGuid”: “{54C5AE92-4021-3D06-C433-B89E48B25CAB}”,
LogLearning: Display: Training Process: “EpisodeCompletionModesGuid”: “{0798C354-46B7-51E2-E4E8-42A0BED6CB17}”,
LogLearning: Display: Training Process: “EpisodeFinalObservationsGuid”: “{AF751ECD-4817-7E19-A9CE-928C6676E66F}”,
LogLearning: Display: Training Process: “ObservationsGuid”: “{5F769E65-49F3-358E-376F-6D8812F873CF}”,
LogLearning: Display: Training Process: “ActionsGuid”: “{C1B7918E-428D-CD61-2317-CD910BB20320}”,
LogLearning: Display: Training Process: “RewardsGuid”: “{00D101BD-48C2-6932-2201-B4A815EB8393}”,
LogLearning: Display: Training Process: “ObservationVectorDimensionNum”: 8,
LogLearning: Display: Training Process: “ActionVectorDimensionNum”: 2,
LogLearning: Display: Training Process: “MaxEpisodeNum”: 1000,
LogLearning: Display: Training Process: “MaxStepNum”: 10000,
LogLearning: Display: Training Process: “PolicyNetworkByteNum”: 72788,
LogLearning: Display: Training Process: “PolicyHiddenUnitNum”: 128,
LogLearning: Display: Training Process: “PolicyLayerNum”: 3,
LogLearning: Display: Training Process: “PolicyActivationFunction”: “ELU”,
LogLearning: Display: Training Process: “PolicyActionNoiseMin”: 0,
LogLearning: Display: Training Process: “PolicyActionNoiseMax”: 0,
LogLearning: Display: Training Process: “CriticNetworkByteNum”: 71240,
LogLearning: Display: Training Process: “CriticHiddenUnitNum”: 128,
LogLearning: Display: Training Process: “CriticLayerNum”: 3,
LogLearning: Display: Training Process: “CriticActivationFunction”: “ELU”,
LogLearning: Display: Training Process: “ProcessNum”: 1,
LogLearning: Display: Training Process: “IterationNum”: 1000000,
LogLearning: Display: Training Process: “LearningRatePolicy”: 9.999999747378752e-05,
LogLearning: Display: Training Process: “LearningRateCritic”: 0.0010000000474974513,
LogLearning: Display: Training Process: “LearningRateDecay”: 0.9900000095367432,
LogLearning: Display: Training Process: “WeightDecay”: 0.0010000000474974513,
LogLearning: Display: Training Process: “InitialActionScale”: 0.10000000149011612,
LogLearning: Display: Training Process: “BatchSize”: 128,
LogLearning: Display: Training Process: “EpsilonClip”: 0.20000000298023224,
LogLearning: Display: Training Process: “ActionRegularizationWeight”: 0.0010000000474974513,
LogLearning: Display: Training Process: “EntropyWeight”: 0.009999999776482582,
LogLearning: Display: Training Process: “GaeLambda”: 0.8999999761581421,
LogLearning: Display: Training Process: “ClipAdvantages”: true,
LogLearning: Display: Training Process: “AdvantageNormalization”: true,
LogLearning: Display: Training Process: “TrimEpisodeStartStepNum”: 0,
LogLearning: Display: Training Process: “TrimEpisodeEndStepNum”: 0,
LogLearning: Display: Training Process: “Seed”: 1234,
LogLearning: Display: Training Process: “DiscountFactor”: 0.9900000095367432,
LogLearning: Display: Training Process: “Device”: “GPU”,
LogLearning: Display: Training Process: “UseTensorBoard”: false,
LogLearning: Display: Training Process: “UseInitialPolicyNetwork”: true,
LogLearning: Display: Training Process: “UseInitialCriticNetwork”: false,
LogLearning: Display: Training Process: “SynchronizeCriticNetwork”: false,
LogLearning: Display: Training Process: “LoggingEnabled”: true
LogLearning: Display: Training Process: }
LogLearning: Display: car_meneger_C_UAID_3C7C3F2B6046E7AC01_1426584452: Resetting Agents [0 1 2 3 4 5 6 7 … 31 30 29 28 27 26 25 24].
LogLearning: Display: car_meneger_C_UAID_3C7C3F2B6046E7AC01_1426584452: Resetting Agents [0 1 2 3 4 5 6 7 … 31 30 29 28 27 26 25 24].
LogLearning: Display: Training Process: Creating Replay Buffer…
LogLearning: Display: Training Process: Creating Networks…
LogLearning: Display: Training Process: Receiving Policy…
LogLearning: Display: Training Process: Creating Optimizer…
LogLearning: Display: Training Process: Creating PPO Policy…
LogLearning: Display: Training Process: Opening TensorBoard…
LogLearning: Display: Training Process: Begin Training…
LogLearning: Display: Training Process: Profile| Pull Experience 30452ms
LogLearning: Display: Training Process: Profile| PPO compute returns 718ms
LogLearning: Display: Training Process: Profile| PPO old log prob 235ms
LogLearning: Display: Training Process: Profile| PPO learn 584ms
LogLearning: Display: Training Process: Profile| Training 1538ms
LogLearning: Display: Training Process: Profile| Pushing Policy 1ms
LogLearning: Display: Training Process: Iter: 0 | Avg Reward: -0.00000 | Avg Return: -0.00044 | Avg Value: 0.05228 | Avg Episode Length: 294.11765
LogLearning: Display: car_meneger_C_UAID_3C7C3F2B6046E7AC01_1426584452: Resetting Agents [0 1 2 3 4 5 6 7 … 31 30 29 28 27 26 25 24].
LogLearning: Display: car_meneger_C_UAID_3C7C3F2B6046E7AC01_1426584452: Resetting Agents [0 1 2 3 4 5 6 7 … 31 30 29 28 27 26 25 24].
LogLearning: Display: Training Process: Saving Snapshot…
LogLearning: Display: Training Process: Profile| Logging 2ms
LogLearning: Display: Training Process: Profile| Pull Experience 30707ms
LogLearning: Display: Training Process: Profile| PPO compute returns 25ms
LogLearning: Display: Training Process: Profile| PPO old log prob 50ms
LogLearning: Display: Training Process: Profile| PPO learn 554ms
LogLearning: Display: Training Process: Profile| Training 630ms
LogLearning: Display: Training Process: Profile| Pushing Policy 0ms
LogLearning: Display: Training Process: Iter: 78 | Avg Reward: -0.00000 | Avg Return: -0.00002 | Avg Value: 0.04163 | Avg Episode Length: 294.11765
LogLearning: Display: car_meneger_C_UAID_3C7C3F2B6046E7AC01_1426584452: Resetting Agents [0 1 2 3 4 5 6 7 … 31 30 29 28 27 26 25 24].
LogLearning: Display: car_meneger_C_UAID_3C7C3F2B6046E7AC01_1426584452: Resetting Agents [0 1 2 3 4 5 6 7 … 31 30 29 28 27 26 25 24].
LogLearning: Display: car_meneger_C_UAID_3C7C3F2B6046E7AC01_1426584452: Resetting Agents [0 1 2 3 4 5 6 7 … 31 30 29 28 27 26 25 24].
LogLearning: Display: Training Process: Profile| Logging 0ms
LogLearning: Display: Training Process: Profile| Pull Experience 32020ms
LogLearning: Display: Training Process: Profile| PPO compute returns 22ms
LogLearning: Display: Training Process: Profile| PPO old log prob 43ms
LogLearning: Display: Training Process: Profile| PPO learn 558ms
LogLearning: Display: Training Process: Profile| Training 625ms
LogLearning: Display: Training Process: Profile| Pushing Policy 0ms
LogLearning: Display: Training Process: Iter: 156 | Avg Reward: -0.00000 | Avg Return: -0.00039 | Avg Value: 0.05551 | Avg Episode Length: 294.11765
LogDerivedDataCache: C:/Users/****/AppData/Local/UnrealEngine/Common/DerivedDataCache: Maintenance finished in +00:00:57.212 and deleted 0 files with total size 0 MiB and 0 empty folders. Scanned 30013 files in 30852 folders with total size 3435 MiB.
LogLearning: Display: car_meneger_C_UAID_3C7C3F2B6046E7AC01_1426584452: Resetting Agents [0 1 2 3 4 5 6 7 … 31 30 29 28 27 26 25 24].
LogLearning: Display: car_meneger_C_UAID_3C7C3F2B6046E7AC01_1426584452: Resetting Agents [0 1 2 3 4 5 6 7 … 31 30 29 28 27 26 25 24].
LogLearning: Display: Training Process: Profile| Logging 0ms
LogLearning: Display: Training Process: Profile| Pull Experience 32488ms
LogLearning: Display: Training Process: Profile| PPO compute returns 22ms
LogLearning: Display: Training Process: Profile| PPO old log prob 41ms
LogLearning: Display: Training Process: Profile| PPO learn 556ms
LogLearning: Display: Training Process: Profile| Training 620ms
LogLearning: Display: Training Process: Profile| Pushing Policy 1ms
LogLearning: Display: Training Process: Iter: 234 | Avg Reward: -0.00000 | Avg Return: -0.00033 | Avg Value: 0.04238 | Avg Episode Length: 294.11765
LogLearning: Display: car_meneger_C_UAID_3C7C3F2B6046E7AC01_1426584452: Resetting Agents [0 1 2 3 4 5 6 7 … 31 30 29 28 27 26 25 24].
LogLearning: Display: car_meneger_C_UAID_3C7C3F2B6046E7AC01_1426584452: Resetting Agents [0 1 2 3 4 5 6 7 … 31 30 29 28 27 26 25 24].
LogLearning: Display: Training Process: Profile| Logging 0ms
LogLearning: Display: Training Process: Profile| Pull Experience 31170ms
LogLearning: Display: Training Process: Profile| PPO compute returns 28ms
LogLearning: Display: Training Process: Profile| PPO old log prob 51ms
LogLearning: Display: Training Process: Profile| PPO learn 669ms
LogLearning: Display: Training Process: Profile| Training 748ms
LogLearning: Display: Training Process: Profile| Pushing Policy 1ms
LogLearning: Display: Training Process: Iter: 312 | Avg Reward: -0.00000 | Avg Return: -0.00071 | Avg Value: 0.04619 | Avg Episode Length: 294.11765
LogLearning: Display: car_meneger_C_UAID_3C7C3F2B6046E7AC01_1426584452: Resetting Agents [0 1 2 3 4 5 6 7 … 31 30 29 28 27 26 25 24].
LogLearning: Display: car_meneger_C_UAID_3C7C3F2B6046E7AC01_1426584452: Resetting Agents [0 1 2 3 4 5 6 7 … 31 30 29 28 27 26 25 24].
LogLearning: Display: Training Process: Profile| Logging 0ms
LogLearning: Display: Training Process: Profile| Pull Experience 30875ms
LogLearning: Display: Training Process: Profile| PPO compute returns 23ms
LogLearning: Display: Training Process: Profile| PPO old log prob 75ms
LogLearning: Display: Training Process: Profile| PPO learn 660ms
LogLearning: Display: Training Process: Profile| Training 759ms
LogLearning: Display: Training Process: Profile| Pushing Policy 0ms
LogLearning: Display: Training Process: Iter: 390 | Avg Reward: -0.00000 | Avg Return: -0.00064 | Avg Value: 0.03831 | Avg Episode Length: 294.11765
LogLearning: Display: car_meneger_C_UAID_3C7C3F2B6046E7AC01_1426584452: Resetting Agents [0 1 2 3 4 5 6 7 … 31 30 29 28 27 26 25 24].
LogLearning: Display: car_meneger_C_UAID_3C7C3F2B6046E7AC01_1426584452: Resetting Agents [0 1 2 3 4 5 6 7 … 31 30 29 28 27 26 25 24].
LogLearning: Display: Training Process: Profile| Logging 0ms
LogLearning: Display: Training Process: Profile| Pull Experience 30602ms
LogLearning: Display: Training Process: Profile| PPO compute returns 24ms
LogLearning: Display: Training Process: Profile| PPO old log prob 66ms
LogLearning: Display: Training Process: Profile| PPO learn 621ms
LogLearning: Display: Training Process: Profile| Training 711ms
LogLearning: Display: Training Process: Profile| Pushing Policy 1ms
LogLearning: Display: Training Process: Iter: 468 | Avg Reward: -0.00000 | Avg Return: -0.00026 | Avg Value: 0.03055 | Avg Episode Length: 294.11765
LogLearning: Display: car_meneger_C_UAID_3C7C3F2B6046E7AC01_1426584452: Resetting Agents [0 1 2 3 4 5 6 7 … 31 30 29 28 27 26 25 24].
LogLearning: Display: car_meneger_C_UAID_3C7C3F2B6046E7AC01_1426584452: Resetting Agents [0 1 2 3 4 5 6 7 … 31 30 29 28 27 26 25 24].
LogLearning: Display: Training Process: Profile| Logging 0ms
LogLearning: Display: Training Process: Profile| Pull Experience 31102ms
LogLearning: Display: Training Process: Profile| PPO compute returns 27ms
LogLearning: Display: Training Process: Profile| PPO old log prob 53ms
LogLearning: Display: Training Process: Profile| PPO learn 610ms
LogLearning: Display: Training Process: Profile| Training 691ms
LogLearning: Display: Training Process: Profile| Pushing Policy 1ms
LogLearning: Display: Training Process: Iter: 546 | Avg Reward: -0.00000 | Avg Return: -0.00002 | Avg Value: 0.03677 | Avg Episode Length: 294.11765
LogLearning: Display: car_meneger_C_UAID_3C7C3F2B6046E7AC01_1426584452: Resetting Agents [0 1 2 3 4 5 6 7 … 31 30 29 28 27 26 25 24].
LogLearning: Display: Training Process: Profile| Logging 0ms
LogLearning: Display: Training Process: Profile| Pull Experience 30497ms
LogLearning: Display: Training Process: Profile| PPO compute returns 23ms
LogLearning: Display: Training Process: Profile| PPO old log prob 50ms
LogLearning: Display: Training Process: Profile| PPO learn 674ms
LogLearning: Display: Training Process: Profile| Training 747ms
LogLearning: Display: Training Process: Profile| Pushing Policy 0ms
LogLearning: Display: Training Process: Iter: 624 | Avg Reward: -0.00000 | Avg Return: -0.00046 | Avg Value: 0.04823 | Avg Episode Length: 294.11765
LogLearning: Display: car_meneger_C_UAID_3C7C3F2B6046E7AC01_1426584452: Resetting Agents [0 1 2 3 4 5 6 7 … 31 30 29 28 27 26 25 24].
LogLearning: Display: car_meneger_C_UAID_3C7C3F2B6046E7AC01_1426584452: Resetting Agents [0 1 2 3 4 5 6 7 … 31 30 29 28 27 26 25 24].
LogLearning: Display: car_meneger_C_UAID_3C7C3F2B6046E7AC01_1426584452: Resetting Agents [0 1 2 3 4 5 6 7 … 31 30 29 28 27 26 25 24].
LogLearning: Display: Training Process: Profile| Logging 0ms
LogLearning: Display: Training Process: Profile| Pull Experience 30472ms
LogLearning: Display: Training Process: Profile| PPO compute returns 38ms
LogLearning: Display: Training Process: Profile| PPO old log prob 53ms
LogLearning: Display: Training Process: Profile| PPO learn 621ms
LogLearning: Display: Training Process: Profile| Training 712ms
LogLearning: Display: Training Process: Profile| Pushing Policy 0ms
LogLearning: Display: Training Process: Iter: 702 | Avg Reward: -0.00000 | Avg Return: -0.00016 | Avg Value: 0.04096 | Avg Episode Length: 294.11765
LogLearning: Display: car_meneger_C_UAID_3C7C3F2B6046E7AC01_1426584452: Resetting Agents [0 1 2 3 4 5 6 7 … 31 30 29 28 27 26 25 24].
LogLearning: Display: Training Process: Profile| Logging 0ms
LogLearning: Display: Training Process: Profile| Pull Experience 30463ms
LogLearning: Display: Training Process: Profile| PPO compute returns 38ms
LogLearning: Display: Training Process: Profile| PPO old log prob 47ms
LogLearning: Display: Training Process: Profile| PPO learn 609ms
LogLearning: Display: Training Process: Profile| Training 695ms
LogLearning: Display: Training Process: Profile| Pushing Policy 0ms
LogLearning: Display: Training Process: Iter: 780 | Avg Reward: -0.00000 | Avg Return: -0.00027 | Avg Value: 0.03670 | Avg Episode Length: 294.11765
LogLearning: Display: car_meneger_C_UAID_3C7C3F2B6046E7AC01_1426584452: Resetting Agents [0 1 2 3 4 5 6 7 … 31 30 29 28 27 26 25 24].
LogLearning: Display: car_meneger_C_UAID_3C7C3F2B6046E7AC01_1426584452: Resetting Agents [0 1 2 3 4 5 6 7 … 31 30 29 28 27 26 25 24].
LogLearning: Display: car_meneger_C_UAID_3C7C3F2B6046E7AC01_1426584452: Resetting Agents [0 1 2 3 4 5 6 7 … 31 30 29 28 27 26 25 24].
LogLearning: Display: Training Process: Profile| Logging 0ms
LogLearning: Display: Training Process: Profile| Pull Experience 30782ms
LogLearning: Display: Training Process: Profile| PPO compute returns 23ms
LogLearning: Display: Training Process: Profile| PPO old log prob 41ms
LogLearning: Display: Training Process: Profile| PPO learn 731ms
LogLearning: Display: Training Process: Profile| Training 797ms
LogLearning: Display: Training Process: Profile| Pushing Policy 1ms
LogLearning: Display: Training Process: Iter: 858 | Avg Reward: 0.00000 | Avg Return: 0.00007 | Avg Value: 0.03990 | Avg Episode Length: 294.11765
LogLearning: Display: car_meneger_C_UAID_3C7C3F2B6046E7AC01_1426584452: Resetting Agents [0 1 2 3 4 5 6 7 … 31 30 29 28 27 26 25 24].
LogLearning: Display: Training Process: Profile| Logging 0ms
LogLearning: Display: Training Process: Profile| Pull Experience 30917ms
LogLearning: Display: Training Process: Profile| PPO compute returns 23ms
LogLearning: Display: Training Process: Profile| PPO old log prob 43ms
LogLearning: Display: Training Process: Profile| PPO learn 605ms
LogLearning: Display: Training Process: Profile| Training 672ms
LogLearning: Display: Training Process: Profile| Pushing Policy 1ms
LogLearning: Display: Training Process: Iter: 936 | Avg Reward: -0.00000 | Avg Return: -0.00001 | Avg Value: 0.04052 | Avg Episode Length: 294.11765
LogLearning: Display: car_meneger_C_UAID_3C7C3F2B6046E7AC01_1426584452: Resetting Agents [0 1 2 3 4 5 6 7 … 31 30 29 28 27 26 25 24].
LogLearning: Display: car_meneger_C_UAID_3C7C3F2B6046E7AC01_1426584452: Resetting Agents [0 1 2 3 4 5 6 7 … 31 30 29 28 27 26 25 24].
LogSlate: Updating window title bar state: overlay mode, drag disabled, window buttons hidden, title bar hidden
LogWorld: BeginTearingDown for /Game/VehicleTemplate/Maps/UEDPIE_0_VehicleExampleMap
LogLearning: Display: car_trainer: Stopping training…
LogLearning: Display: Training Process: Saving Snapshot…
LogLearning: Display: Training Process: Profile| Logging 1ms
LogLearning: Display: Training Process: Profile| Pull Experience 18470ms
LogLearning: Display: Training Process: Done!
LogLearning: Display: Training Process: Exiting…
LogWorld: UWorld::CleanupWorld for VehicleExampleMap, bSessionEnded=true, bCleanupResources=true
LogSlate: InvalidateAllWidgets triggered. All widgets were invalidated
LogContentBundle: [VehicleExampleMap(Standalone)] Deleting container.
LogPlayLevel: Display: Shutting down PIE online subsystems
LogSlate: InvalidateAllWidgets triggered. All widgets were invalidated
LogSlate: Updating window title bar state: overlay mode, drag disabled, window buttons hidden, title bar hidden
LogAudioMixer: Deinitializing Audio Bus Subsystem for audio device with ID 3
LogAudioMixer: FMixerPlatformXAudio2::StopAudioStream() called. InstanceID=3
LogAudioMixer: FMixerPlatformXAudio2::StopAudioStream() called. InstanceID=3
LogUObjectHash: Compacting FUObjectHashTables data took 1.54ms
LogPlayLevel: Display: Destroying online subsystem :Context_8

Hi Brendan and the rest of the team!!

I checked out the tutorial and let me tell you that you guys are doing a great job! I specially like how easy it was to add parallel training just by adjusting the maximum number of agents and them placing them on the map.

However, even though I like the simplicity of it, I’d like to know more about the python code that is being executed in the background. I’ve checked the github repo and I didn’t see any .py files in your plugin, so I don’t really know how to access that code.

If I had to guess, I’d say that you guys are using the stable-baselines3 API for the RL algorithms and the Gym API (now Gymnasium) for creating the custom environments. Otherwise, this plugin would be quite a titanic effort if everything is made from scratch.

You already said that discrete actions, more RL algorithms and CNNs are planned in the roadmap. If I had to guess again, you would also have to develop a way to convert the frames from the cameras to 3D matrices with the RGB data as the input for the CNN. All of this while retaining the parallelization ability that you already have in the plugin. Quite a challenging scenario!

So, how can I access those python files, and did I make any mistakes with my assumptions? (I surely did)

Thanks for your efforts! I’m sure this plugin will be the foundation for the NPCs of many future games and many scientific studies too :blush:

Edit: apparently they are using Tianshou, a RL platform that focuses on speed, instead of SB3. I had no idea that this existed, but good to know :ok_hand:

1 Like

I believe you are looking for these. They are in content folder of the plugin

1 Like

Oh right, I guess they are not on Github. Thanks for clarifying!!

1 Like

Your average rewards and returns are basically zero. If you look at this screenshot from the tutorial, you can see the rewards are much larger:

I would revisit the reward sections of the tutorial/code and see what was missed.

BTW, I would encourage anyone with issues to open new topics on the forums. I get notifications if you use the “learning-agents” tag and will respond in a timely fashion or other community members may be able to help. Thanks!

EDIT: I must have done something wrong… All night training still not working… Now, new project followed the modified tutorial on Unreal Learning Agents Tutorial - YouTube and within 20 minutes training my drivers already have solid performance around the track :slight_smile:

Next up tweaking and making the system compatible on multiple tracks :smiley:

Old post

Hi @Deathcalibur ,

Can you elaborate on the avg. Rewards/Return/Value… For example, I’ve been training the AI for the past 6 hours already, and it’s still only doing little steering/little throttle… Not 1 corner is done succesfully… Now I will provide some logs:

LogLearning: Display: Training Process: Iter:   25116 | Avg Reward: -0.01283 | Avg Return: -3.92119 | Avg Value: -4.39131 | Avg Episode Length: 294.11765
LogLearning: Display: Training Process: Iter:   25194 | Avg Reward: 0.00827 | Avg Return: 2.44789 | Avg Value: -0.44297 | Avg Episode Length: 294.11765
LogLearning: Display: Training Process: Iter:   25272 | Avg Reward: 0.00897 | Avg Return: 2.65678 | Avg Value: 0.39569 | Avg Episode Length: 294.11765
LogLearning: Display: Training Process: Iter:   25350 | Avg Reward: 0.00890 | Avg Return: 2.59105 | Avg Value: 0.21417 | Avg Episode Length: 294.11765
LogLearning: Display: Training Process: Iter:   25428 | Avg Reward: -0.01433 | Avg Return: -2.04743 | Avg Value: -1.74980 | Avg Episode Length: 294.11765
LogLearning: Display: Training Process: Iter:   25506 | Avg Reward: 0.00873 | Avg Return: 2.59401 | Avg Value: -2.59918 | Avg Episode Length: 294.11765
LogLearning: Display: Training Process: Iter:   25584 | Avg Reward: -0.00081 | Avg Return: -0.28876 | Avg Value: -2.95976 | Avg Episode Length: 294.11765
LogLearning: Display: Training Process: Iter:   25662 | Avg Reward: -0.00957 | Avg Return: -2.95773 | Avg Value: -3.81895 | Avg Episode Length: 294.11765
LogLearning: Display: Training Process: Iter:   25740 | Avg Reward: -0.03639 | Avg Return: -10.96862 | Avg Value: -3.31776 | Avg Episode Length: 294.11765
LogLearning: Display: Training Process: Iter:   25818 | Avg Reward: -0.01724 | Avg Return: -4.92911 | Avg Value: -5.24123 | Avg Episode Length: 294.11765
LogLearning: Display: Training Process: Iter:   25896 | Avg Reward: -0.00036 | Avg Return: 0.66710 | Avg Value: -2.78856 | Avg Episode Length: 294.11765
LogLearning: Display: Training Process: Iter:   25974 | Avg Reward: -0.03991 | Avg Return: -10.99330 | Avg Value: -4.33503 | Avg Episode Length: 294.11765
LogLearning: Display: Training Process: Iter:   26052 | Avg Reward: 0.00838 | Avg Return: 2.46531 | Avg Value: -1.31321 | Avg Episode Length: 294.11765
LogLearning: Display: Training Process: Iter:   26130 | Avg Reward: 0.00379 | Avg Return: 1.09917 | Avg Value: -1.93314 | Avg Episode Length: 294.11765

Later on, I decided, maybe if I control an AI instance myself and just drive around, it’ll help the learners… Not sure if this is true? But that iterations I got the following:

[2023.10.14-19.38.27:508][387]LogLearning: Display: Training Process: Iter:       0 | Avg Reward: 0.40469 | Avg Return: 97.76918 | Avg Value: 3.02828 | Avg Episode Length: 294.11765
[2023.10.14-19.39.18:220][987]LogLearning: Display: Training Process: Iter:      78 | Avg Reward: 0.24908 | Avg Return: 73.69471 | Avg Value: 4.82161 | Avg Episode Length: 294.11765
[2023.10.14-19.40.09:019][587]LogLearning: Display: Training Process: Iter:     156 | Avg Reward: 0.21927 | Avg Return: 65.33840 | Avg Value: 7.40708 | Avg Episode Length: 294.11765

As you can see, the numbers are way higher, than what I had on “auto-pilot” after 25k iterations…

Also, in your example screenshot I only see negative numbers for Reward, Return and Value, should these be negative?

Ps. here are the paramets when I start play:

[2023.10.14-19.51.41:930][984]PIE: Play in editor total start time 0.341 seconds.
[2023.10.14-19.51.41:949][984]LogLearning: Display: BP_DrivingRLTrainer: Sending / Receiving initial policy...
[2023.10.14-19.51.42:798][984]LogLearning: Display: Training Process: {
[2023.10.14-19.51.42:798][984]LogLearning: Display: Training Process:     "TaskName": "BP_DrivingRLTrainer",
[2023.10.14-19.51.42:798][984]LogLearning: Display: Training Process:     "TrainerMethod": "PPO",
[2023.10.14-19.51.42:798][984]LogLearning: Display: Training Process:     "TrainerType": "SharedMemory",
[2023.10.14-19.51.42:798][984]LogLearning: Display: Training Process:     "TimeStamp": "2023-10-14_19-51-41",
[2023.10.14-19.51.42:798][984]LogLearning: Display: Training Process:     "SitePackagesPath": "E:/Program Files/Epic Games/UE_5.3/Engine/Plugins/Experimental/PythonFoundationPackages/Content/Python/Lib/Win64/site-packages",
[2023.10.14-19.51.42:798][984]LogLearning: Display: Training Process:     "IntermediatePath": "E:/Unreal Projects/AIRacer/Intermediate/LearningAgents",
[2023.10.14-19.51.42:798][984]LogLearning: Display: Training Process:     "PolicyGuid": "{52497813-452A-AD39-6E59-CCA26658E80A}",
[2023.10.14-19.51.42:798][984]LogLearning: Display: Training Process:     "ControlsGuid": "{0439ECD7-4DAC-1A3F-D32A-66960B28EBF1}",
[2023.10.14-19.51.42:798][984]LogLearning: Display: Training Process:     "EpisodeStartsGuid": "{316C1D66-4763-B45B-AA97-8F8FC8B0D1C4}",
[2023.10.14-19.51.42:798][984]LogLearning: Display: Training Process:     "EpisodeLengthsGuid": "{F9868096-4395-4EF1-18CF-D9A7C2C0053D}",
[2023.10.14-19.51.42:798][984]LogLearning: Display: Training Process:     "EpisodeCompletionModesGuid": "{63FFFB91-47E2-D049-E244-EC9DB7DFAD36}",
[2023.10.14-19.51.42:798][984]LogLearning: Display: Training Process:     "EpisodeFinalObservationsGuid": "{409CFBD1-4D0A-3B0E-37D7-F6BF8A87078B}",
[2023.10.14-19.51.42:798][984]LogLearning: Display: Training Process:     "ObservationsGuid": "{09393258-4195-DD95-83AC-1F8BE7355380}",
[2023.10.14-19.51.42:798][984]LogLearning: Display: Training Process:     "ActionsGuid": "{D3C32437-41ED-4EEC-4C3C-EB9A6CF53A76}",
[2023.10.14-19.51.42:798][984]LogLearning: Display: Training Process:     "RewardsGuid": "{F8C0C959-439B-398B-CD37-F7AEF93EC2C8}",
[2023.10.14-19.51.42:798][984]LogLearning: Display: Training Process:     "ObservationVectorDimensionNum": 8,
[2023.10.14-19.51.42:798][984]LogLearning: Display: Training Process:     "ActionVectorDimensionNum": 2,
[2023.10.14-19.51.42:798][984]LogLearning: Display: Training Process:     "MaxEpisodeNum": 1000,
[2023.10.14-19.51.42:798][984]LogLearning: Display: Training Process:     "MaxStepNum": 10000,
[2023.10.14-19.51.42:799][984]LogLearning: Display: Training Process:     "PolicyNetworkByteNum": 72788,
[2023.10.14-19.51.42:799][984]LogLearning: Display: Training Process:     "PolicyHiddenUnitNum": 128,
[2023.10.14-19.51.42:799][984]LogLearning: Display: Training Process:     "PolicyLayerNum": 3,
[2023.10.14-19.51.42:799][984]LogLearning: Display: Training Process:     "PolicyActivationFunction": "ELU",
[2023.10.14-19.51.42:799][984]LogLearning: Display: Training Process:     "PolicyActionNoiseMin": 0.25,
[2023.10.14-19.51.42:799][984]LogLearning: Display: Training Process:     "PolicyActionNoiseMax": 0.25,
[2023.10.14-19.51.42:799][984]LogLearning: Display: Training Process:     "CriticNetworkByteNum": 71240,
[2023.10.14-19.51.42:799][984]LogLearning: Display: Training Process:     "CriticHiddenUnitNum": 128,
[2023.10.14-19.51.42:799][984]LogLearning: Display: Training Process:     "CriticLayerNum": 3,
[2023.10.14-19.51.42:799][984]LogLearning: Display: Training Process:     "CriticActivationFunction": "ELU",
[2023.10.14-19.51.42:799][984]LogLearning: Display: Training Process:     "ProcessNum": 1,
[2023.10.14-19.51.42:799][984]LogLearning: Display: Training Process:     "IterationNum": 1000000,
[2023.10.14-19.51.42:799][984]LogLearning: Display: Training Process:     "LearningRatePolicy": 9.999999747378752e-05,
[2023.10.14-19.51.42:799][984]LogLearning: Display: Training Process:     "LearningRateCritic": 0.0010000000474974513,
[2023.10.14-19.51.42:799][984]LogLearning: Display: Training Process:     "LearningRateDecay": 0.9900000095367432,
[2023.10.14-19.51.42:799][984]LogLearning: Display: Training Process:     "WeightDecay": 0.0010000000474974513,
[2023.10.14-19.51.42:799][984]LogLearning: Display: Training Process:     "InitialActionScale": 0.10000000149011612,
[2023.10.14-19.51.42:799][984]LogLearning: Display: Training Process:     "BatchSize": 128,
[2023.10.14-19.51.42:799][984]LogLearning: Display: Training Process:     "EpsilonClip": 0.20000000298023224,
[2023.10.14-19.51.42:799][984]LogLearning: Display: Training Process:     "ActionRegularizationWeight": 0.0010000000474974513,
[2023.10.14-19.51.42:799][984]LogLearning: Display: Training Process:     "EntropyWeight": 0.009999999776482582,
[2023.10.14-19.51.42:799][984]LogLearning: Display: Training Process:     "GaeLambda": 0.8999999761581421,
[2023.10.14-19.51.42:799][984]LogLearning: Display: Training Process:     "ClipAdvantages": true,
[2023.10.14-19.51.42:799][984]LogLearning: Display: Training Process:     "AdvantageNormalization": true,
[2023.10.14-19.51.42:799][984]LogLearning: Display: Training Process:     "TrimEpisodeStartStepNum": 0,
[2023.10.14-19.51.42:799][984]LogLearning: Display: Training Process:     "TrimEpisodeEndStepNum": 0,
[2023.10.14-19.51.42:799][984]LogLearning: Display: Training Process:     "Seed": 1234,
[2023.10.14-19.51.42:799][984]LogLearning: Display: Training Process:     "DiscountFactor": 0.9900000095367432,
[2023.10.14-19.51.42:799][984]LogLearning: Display: Training Process:     "Device": "GPU",
[2023.10.14-19.51.42:799][984]LogLearning: Display: Training Process:     "UseTensorBoard": false,
[2023.10.14-19.51.42:799][984]LogLearning: Display: Training Process:     "UseInitialPolicyNetwork": true,
[2023.10.14-19.51.42:799][984]LogLearning: Display: Training Process:     "UseInitialCriticNetwork": false,
[2023.10.14-19.51.42:799][984]LogLearning: Display: Training Process:     "SynchronizeCriticNetwork": false,
[2023.10.14-19.51.42:799][984]LogLearning: Display: Training Process:     "LoggingEnabled": true
[2023.10.14-19.51.42:800][984]LogLearning: Display: Training Process: }

I checkd the code and can’t find the what i missed.

Great work on that tutorial!

What are the current limitations of the system in regards to what kind of AI systems can be created? Would something like a shooter be possible? E.g. if I want to replace the bots in the Lyra Example. Is it currently possible to train them to move around, shoot, etc.?

1 Like

With some creativity with how you represent the observation space, it should be possible to train a bot to play Lyra. Figuring out how to represent the game world will be challenging. Starting with some kind of open room with limited number of obstacles would be doable since you could query for those objects and maybe feed in the top X as input (basically adding repeating columns with some kind of “IsValid” enum in case you have less than your max). The main problem with this approach is that the weights will not be shared across these object encodings :frowning_face:

One other obvious limitation is that the current models only support continuous actions, so you have to come up with some means to translate those actions into shooting or jumping, etc. It won’t work amazingly well until we get discrete actions added. Otherwise, you could possible blend a NN model which handles the continuous actions with some handwritten AI logic for the discrete ones :thinking:

We are working on making both observations and actions much more flexible in the next release of Learning Agents. :smiley:

2 Likes