CustomDepth incurs huge performance hit

RenderDoc reveals that with about 150+ primitives using CustomDepth + Stencil, half of the scene render time is just handling CustomDepth passes. FPS tanks with extensive usage of CustomDepth in more complicated scenes.

CustomDepth is our only way to mask decals to specific objects currently, but we can’t do that without this gigantic performance hit. We need a way to sample the primitive ID native to the SceneDepth scene texture so we don’t have to use CustomDepth, or an optimization to CustomDepth (preferably both).