Try implement Car boost into Learning Agents Driving Tutorial, but I ran into an implementation problem

Hello!

I’m trying to implement car nitro (boost) based on a tutorial.
My goal is to teach the agent how to boost the car for a certain period of time.

I ran into the problem that the agent switches the nitro state very quickly.

I am new to this field and may not fully understand how to design such conditions correctly. I will be very grateful for the advice on creating such things.

I added the following code to the tutorial:

Specify Agent Observation:

Added parameters: whether the boost is on, the time of the turned-on nitro and the remaining charge

Gather Agent Observation:

Add new Agent Action:

Into TrainingEnviroment create next logic for Reward:

Nitro component:

The charge is restored when the nitro is turned off and increased when turned on

My goal was to show the agent next:

  • That if is doesn’t use nitro for a long time, he gets a penalty.
  • Using nitro gives you a penalty (prevent spam switching).
  • If the charge of the nitro is low and the nitro is turned on, the penalty is increased to (prevent spam switching).
  • The longer the nitro is on, the more rewards the agent gets.

Perhaps the conditions that I want to implement are too complicated and I need more information, for example: It’s been a long time since the last switch and giving away a small number of penalties.

Perhaps the system itself is too complex and it needs to be simplified.

I will be glad for any advice and guidance on my mistakes.

I would start with making this much easier for the agent to learn. For example, boost is a one-click activates for X seconds, and then goes on cooldown for Y seconds.

Maybe start with this and then grow the capabilities rather than starting at the end point.

You’ll learn faster this way.

Thank you so much for your advice! Now I am implementing a simpler version with simpler data: Nitro is enabled, is it possible to enable it.

After some time spent learning, the continuous tapping really disappeared. But agents still prefer to turn it off quickly after turning it on.

Now, as I understand it, the task is to explain to the agent that too short an interval between switches is a mistake.