For each loop on TArray

I agree that some compilers do this work for us, but some not. One either test it, or manually write an appropriate version for the certain situation.

General.
Off-top: for Intel C++ compiler O2, O3 might be useful.