Engine optimizations

Thank you anyway, surely with the progress of the project, certainly improve the performance of Engine.

Very cool stuff, found your articles to be well presented and very educational!

Would be interested in seeing if you are able to profile VR mode (e.g. oculus rift) vs Regular to see if the way the rendering happens can be optimized, in its current state, it can easily drop from e.g. 180+fps to below 60 with the same resolution (hmd sp 100). Given that 75+ fps is the requirement going forward from DK2, there will need to be some miracles found here :slight_smile:

I’ve found duplication of work between eyes and some post-process stuff that should be running across the entire image rather than per eye. Largely, I think there’s some caching architecture that needs to be put in place to truly speed up stereo rendering. This is, unfortunately, messy and complicated (and way too big of a project).

Thankfully the work Pablo is doing will pay dividends for both mono and stereo users. I think our best bet in the long run is hoping Epic makes stereo rendering optimizations a priority before CV1 lands.

I added a new blog post about optimizing FDeferredShadingSceneRenderer::InitViews() by making the front-to-back sorting of base pass draw lists asynchronous.

Hi PZurita. I’d be interested in your opinion on this question I posted on the AnswerHub a while ago. It’s just a very simple query relating to how UE4 deals with vector component indexing. Not likely to be all that performance critical, but it did seem strange to me so I’d like to understand if/why it’s standard practice.

I can’t really answer that because in all cases so far I have used VectorRegister to get a proper SSE/NEON vector register and I keep all my math work vectorized so I don’t go back to scalar floats. If I need to grab just one component for whatever reason I would use VectorSelect() which returns a vector.

Well that may in fact explain it actually. I have no experience with using specialized data structures for vector math, but perhaps if that is standard for performance critical code, then there is just no reason for the basic vector class to be fully optimized. Thanks for the info.

I just read through your AABB optimization blog in order to learn a bit about VectorRegister use. Your explanations are very well written, thanks!

Something occurred to me that suggested a further (minor) optimization possibility in the matrix transform. Since I don’t know much about this stuff, and I’m aware that sometimes what would seem like an optimization actually isn’t due to hardware specifics, I thought I’d run it by you.

In FBox::TransformBy(FMatrix) there is the following:

r0 = VectorAdd(r0, m3);
r1 = VectorAdd(r1, m3);
r2 = VectorAdd(r2, m3);
r3 = VectorAdd(r3, m3);
r4 = VectorAdd(r4, m3);
r5 = VectorAdd(r5, m3);
r6 = VectorAdd(r6, m3);
r7 = VectorAdd(r7, m3);

I assume this is applying the translation component of the transform. Since this is adding a constant offset to every vertex in the destination coordinate space, I believe it wouldn’t affect the subsequent min/max testing. Therefore these 8 vector additions could be omitted, with just the 2 resulting min and max vectors being translated at the end. Makes sense, or I’m missing something?

This particular method has been optimized in current master branch (future 4.8 release), see here:

Ah thanks. I checked 4.7 but not master. Yep, I’d figured that transforming all the vertices had to be more work than was necessary.