Ok I figured it out. It does add 2 instructions to the material but it is hard to say if that cost is meaningful, or if the white-optimization will end up being very important in actual usage.
The top two lines need to say:
float rayheight=0.996;
float oldray=0.996+stepsize;
Adding stepsize to oldray allows the intersection to handle slope the same as the future steps (it simulates a step already having been taken).
Another option to optimize a POM that requires tons of steps would be to generate a distance field from the initial pixels and use that as the first step distance. The cost of the extra lookup and math would probably be a wash unless you were doing lots of steps.