Course: Learning Agents (5.5)

I’m not sure I understand your understanding fully but I’ll explain how I think about it.

The agent is learning a function (the neural network) called the policy which transforms observations into actions. The agent uses the reward to train the policy.

Prior to adding the lookahead observations, the agent doesn’t have any information about the shape of the road that is coming ahead, so it can’t learn how to take a turn differently from a straightaway. Once we add the lookahead, the agent can “see” the shape of the road and can adjust its behavior to anticipate turns sooner, etc.

Hope this helps.