A new technique to render images.

I’m not entirely sure where to post this but mods will move it. I hope this can attract the attention of the engine development team. Maybe this is something of worth to you all, maybe not, who knows. I’m providing my idea here for free with the only exception that if you use my idea, i would really like to see what you have used it with and maybe credit for the idea but credit for the idea isn’t necessary. If my technique can change the way things are rendered for the better for everyone i feel that would outweigh the need for any credit.

I’ve had this idea for quite awhile, probably close to fifteen years now. I’ve tried implementing it several times but i just can’t figure it out how to. I’ve gotten the math down to figure it out, it generic geometry, easy stuff, but implementing it has been a pain for me. So im going to share my idea here with you all, hoping i can get some feedback from the rendering guys at epic or anyone. Mind you this technique I’ve never told anyone, the first time im coming forward with it. I don’t think anyone has ever had the idea or at-least if they have they haven’t publicly posted it on the internet. I’ve searched high and low for something similar but never found anything even remotely close. I’m not even sure if it’s possible or if its even worth the trouble. I just think this may have the potential to have significant effects on performance in mobile and any device really. anyways on to my rendering technique.


Unfortunately this is not viable as it makes collision tracing impossible.

I would not immediately dismiss this idea. This is something we’ve been doing as an optimization for full screen passes in professional games development for a number of years now.

I think this could be advantageous for something like a compute based particle system, where you’d like calculate all of your collisions using the particle’s size, then generating a list of particles to draw from the gpu. Using this technique, you could half the number of primitives you’ll need to render. Combined with vertex instancing, this becomes a pretty decent time saver.

The only downside I see is that you’d have a lot of dead pixel space, so you’d be paying for a lot of blend operations where nothing happens.

That being said, if you didn’t need to write depth info (maybe even if you did), you could likely compute all of your tri-quads in a compute shader, writing a list of them out, then in another compute shader, process them all in a tiled based manner and only apply the parts that have a valid pixel contribution. Though no idea on what the final timing would look like.

Still, interesting nonetheless …

Thank you for sharing your very special idea SaxonRah!

You could always map out some ideas with the HUD now that we have draw triangle in the HUD class!

	 * Draws a set of triangles on the Canvas.
	 * @param RenderTexture				Texture to use when rendering the triangles. If no texture is set, then the default white texture is used.
	 * @param Triangles					Triangles to render.
	UFUNCTION(BlueprintCallable, Category=Canvas, meta=(FriendlyName="Draw Triangles"))
	void K2_DrawTriangle(UTexture* RenderTexture, TArray<FCanvasUVTri> Triangles);


class ENGINE_API FCanvasTriangleItem : public FCanvasItem

Those tests would give you better sense of what needs to change on the Engine render side!

I hope Epic checks this out!



Wow. This is interesting. I have no idea how to implement it, but it could make cpu particles twice as dense at the very least, which could bring better particle effects to people with a weak/nonexistent gpu.

In a way this is possible right now its just a matter of actually taking the time to do it. You would make a masked material and create a mesh of a single triangle. Its just a matter of getting the UVs right.

I have a few solutions for this i just need to get an actual implementation working first.

Really ? That is interesting stuff, I will need to go check triangular full-screen pass out asap, there could definitely be some solutions im looking for with this. I guess people have had the idea! Superb! Using a single triangle to render the entire screen, is one implementation i had thought about long ago but disregarded it because i thought full-screen triangle pass might be less performant because of the wasted space off screen. But i guess scissoring and clipping are your friend.

You’re welcome! and thank you for this! i had no idea. im going to take a few days off work and mess around with this !! <3

Indeed, im always looking to push rendering and graphics to the next level but by reducing resources as-well.

I thought about that, but have had no time to implement a solution, maybe you could help me out by making a right triangle with proper uvs and a nonright triangle with proper uvs. You don’t have to but that would be epic!

I suck at working with UVs. I do believe its possible you would need to make sure the material is clamped rather than tiled/repeated. It would need to be masked as well and you probably would need to have a border on it that has no translucency.

Cool! Thanks for the info. I’ll have a stab at it.

This is a really lame response because I can’t explain the why but I have made particle systems using one triangle, mapping a square to it. I was told by the engine programmer this wasn’t saving any performance over a quad. Like I said, I don’t remember why even though it made sense to me.

The main reason this isn’t done is because on modern hardware: triangles are cheap, drawing pixels isn’t. Even when masked or alpha blended the invisible pixels still need to be evaluated before the GPU knows they are invisible. (Using the shader density view in UE4 is great way to visualize this)

So in general it is more efficient on modern hardware to do the complete opposite and use more triangles to create a better cutout of your shape. Here’s a Siggraph presentation describing this in more detail.

The previously mentioned full screen triangle instead of quad is indeed a special case, as it does not require addition pixels to be shaded. Although this was also mostly relevant on older hardware. In an age where you can easily have a few million triangles in view, saving a single one on a full screen quad is not going to help much.

Hi Guys,
I was going to reply last night but looks like Arnage beat me to it and said pretty much the exact same thing I was going to say :slight_smile:

This does work well for triangle shaped Pine tree LODs where the shape you are representing is already a triangle. In other cases we actually have to add more polys to foliage to carefully cut around the masked section to reduce overdraw. Even on iOS.

I actually tested this out once when making the cherry blossom tree for the iOS demo since all the blossoms were camera facing billboards. It actually cost much more on device. Looking at shader complexity it was clear why. A triangle surrounding a quad has twice the surface area, but 100% of the area outside the ‘quad’ within the tri will be overdraw. That means it gets more than twice as slow to render since the original quad area will most likely be fairly filled in. Lets say the original quad had 50% masked and 50% opaque pixels. By making it into a single triangle, suddenly only 25% of the area is actually rendering pixels and the rest is overdraw.

That said, I did end up sort of doing something like this for one of the pine tree LODs for GDC. It’s not really the same though. Basically I made the pine clusters into Tris instead of quads, but I also scaled up the UVs and cut into the shape quite a bit to combat the above mentioned overdraw problem. Since it was for LOD2 it happens far away and you cant tell:


I’ve read that for post processing, using a full screen tri is far better than a quad because the slice from corner to corner the pixels overlapping gets essentially doubly computed causing a waste in time. Is this true?


Cool! Thanks for all the help you guys. Exactly the kind of feedback i was looking for.

Would it be wrong to say that gpu vendors have kinda over valued the triangle and essentially ignored the pixel?

There seems to be an area in which could be highly optimized, why does the gpu care at all about 100% invisible pixels. It shouldn’t need to evaluate the 100% invisible pixels to know that they are in fact invisible. That seems awfully redundant. It ‘should’ only compute pixels in which have some non FF alpha value.

It has to render the whole triangle one way or another. If the whole triangle is just part of an image its fast. However since this case means part of the triangle is not visible, that part has to be rendered anyway and then it has to determine not to draw anything. This isn’t as fast as just drawing the image. Two triangles (the complete image) is going to render faster than a single triangle with a more complex case. I didn’t think of this before, but it makes a hell of a lot of sense.

I’m afraid I do not know enough about the exact impact this could have to answer this.

The problem is that a GPU can only know a pixel is 100% invisible after it has evaluated the shader for that pixel. How can you know a texture has an alpha of 0 for that pixel without reading that texture first? The same goes for other shader logic that can influence the final alpha.

Hmm this is true. Thanks :slight_smile:

"Would it be wrong to say that gpu vendors have kinda over valued the triangle and essentially ignored the pixel? "

There is another way to think about it. There are generally way more pixels than triangles being rendered for a given mesh/material. That is the main reason why it is way faster to compute things using the vertex shader than the pixel shader (this is done via CustomUVs in ue4). Since there are less verts, it is faster to calculate for each vert than for each pixel.

Ah! That’s a much better way of looking at it, and also very interesting. Thanks for the new outlook! I will definitely keep this in mind when developing passes.