Great but...please a double precison solution

Handle it? Yes. Handle it efficiently? No.

Most SIMD lanes on CPUs are still 128 bits wide. So, you either get 2x64 values (e.g., doubles), or 4x32 values (e.g., floats) per operation. That means you require twice as many mathematical OPs to do the same thing in float. As well as missing out on various 4 lane tricks.