There is no 1/N term involved.
If you implemented CPU side code as direct copy of GPU code, I mentioned already, then you do not need to add or alter anything.
Check correctly every step for equality between CPU and GPU first. h0(k), h(t,k)
There is no 1/N term involved.
If you implemented CPU side code as direct copy of GPU code, I mentioned already, then you do not need to add or alter anything.
Check correctly every step for equality between CPU and GPU first. h0(k), h(t,k)