The calling code adds offsets to FetchRowOfFour call, so it does not use only positive offsets.
For 4x4 filter, the sample center is offset is offset by (-1,-1). It matches gather code.
for 7x7 filter, the sample center is offset by (-3,-3), It is half a texel off the gather code.
I don’t see more than half a texel shift anywhere.