float4 vs float, which is faster?

you understand that Float4 has 4 inbuilt floats (underlying structure is either a queue or vector) so:
float.x, float.y, float.z, float.w

its not a matter of speed float and float4 are totally different, it’s not like int32 vs int64 where they have different min/max values.

float4 may be able to use more parallel instructions on the GPU simultaneously, if you need to send many to the GPU in one batch.