Course: Learning Agents (5.5)

Hello. Thank you for this fantastic course. I am extremely new to the whole ML/DL domain and have only a subtle understanding of NNs, but I have a theoretical (probably) question. I don’t ask for a thorough explanation because it’d probably take one to read me a year-long university course of ML, so a short answer with some hints what to look up on the subject would be enough.

So the question is why does just adding lookahead observations of the track improves the outcome of the training? I mean I don’t get the relation so far, because from what I understand, actions, rewards and completions are all executed/gathered disregarding the observation (at least in the provided BP samples). I can only guess that track spline lookahead locations and directions make training more complex and in this case complexity is good to a certain point but how does it affect the inferred actions?