Try implement Car boost into Learning Agents Driving Tutorial, but I ran into an implementation problem

Hello!

I’m trying to implement car nitro (boost) based on a tutorial.
My goal is to teach the agent how to boost the car for a certain period of time.

I ran into the problem that the agent switches the nitro state very quickly.

I am new to this field and may not fully understand how to design such conditions correctly. I will be very grateful for the advice on creating such things.

I added the following code to the tutorial:

Specify Agent Observation:

Added parameters: whether the boost is on, the time of the turned-on nitro and the remaining charge

Gather Agent Observation:

Add new Agent Action:

Into TrainingEnviroment create next logic for Reward:

Nitro component:

The charge is restored when the nitro is turned off and increased when turned on

My goal was to show the agent next:

  • That if is doesn’t use nitro for a long time, he gets a penalty.
  • Using nitro gives you a penalty (prevent spam switching).
  • If the charge of the nitro is low and the nitro is turned on, the penalty is increased to (prevent spam switching).
  • The longer the nitro is on, the more rewards the agent gets.

Perhaps the conditions that I want to implement are too complicated and I need more information, for example: It’s been a long time since the last switch and giving away a small number of penalties.

Perhaps the system itself is too complex and it needs to be simplified.

I will be glad for any advice and guidance on my mistakes.