float4 vs float, which is faster?

Hi Guys. I’m Just curious about a hlsl performance issue. Operations on float4 or float variables, which is faster? Or are they almost the same?

e.g.
float4 * float4 vs float * float
pow(float4, float4) vs pow(float, float)
lerp(float4, float4, float4) vs lerp(float, float, float)
sin(float4) vs sin(float)

This may help us write better materials. Anyone knows the underlying mechanism? Thanks.

you understand that Float4 has 4 inbuilt floats (underlying structure is either a queue or vector) so:
float.x, float.y, float.z, float.w

its not a matter of speed float and float4 are totally different, it’s not like int32 vs int64 where they have different min/max values.

float4 may be able to use more parallel instructions on the GPU simultaneously, if you need to send many to the GPU in one batch.