Hello!
I’m trying to implement car nitro (boost) based on a tutorial.
My goal is to teach the agent how to boost the car for a certain period of time.
I ran into the problem that the agent switches the nitro state very quickly.
I am new to this field and may not fully understand how to design such conditions correctly. I will be very grateful for the advice on creating such things.
I added the following code to the tutorial:
Specify Agent Observation:
Added parameters: whether the boost is on, the time of the turned-on nitro and the remaining charge
Gather Agent Observation:
Add new Agent Action:
Into TrainingEnviroment create next logic for Reward:
Nitro component:
The charge is restored when the nitro is turned off and increased when turned on
My goal was to show the agent next:
- That if is doesn’t use nitro for a long time, he gets a penalty.
- Using nitro gives you a penalty (prevent spam switching).
- If the charge of the nitro is low and the nitro is turned on, the penalty is increased to (prevent spam switching).
- The longer the nitro is on, the more rewards the agent gets.
Perhaps the conditions that I want to implement are too complicated and I need more information, for example: It’s been a long time since the last switch and giving away a small number of penalties.
Perhaps the system itself is too complex and it needs to be simplified.
I will be glad for any advice and guidance on my mistakes.





