In the Pixel Coordinates section, the method you mentioned:
scale = max( image width, image height )
px = scale*m[0] + image width / 2
py = scale*m[1] + image height / 2
seems like it should instead be:
scale = min( image width, image height )
px = scale*m[0] + image width / 2
py = scale*m[1] + image height / 2
Using minimization worked really well in my case.