I can provide you with some additional context, just because it’s good information to understand.
Nanite replaces the traditional rendering pipeline with its own separate rendering system. This includes lighting and shadows, shaders, polygons- all different.
In a normal rendering pipeline, every vertex needs to be computed individually. This gets really expensive, because each vertex needs to hold its position, normals, texture coordinates, and color. That’s quite a lot of information for a 1,000 tri mesh. Now, passing that information along to a material also increases the complexity, as that material has something called a Vertex Shader and a Pixel Shader. Any logic done on the Vertex Shader (usually by default things like World Position Offset, lighting, and skinning) has to be computed per-vertex, so the more vertices, the less performant those calculations are. This is why games usually want your meshes to be as low-poly as possible, and use LODs to make objects even more low-poly at a distance. In game rendering, every vertex matters. This is actually the same reason we use normal maps, because it allows textures to fake the lighting calculations done by vertices at a pixel level, without requiring super detailed geometry.
Now, Nanite throws all of this out the window. Nanite uses what it calls a “cluster” system, which breaks detailed meshes into groups of triangles, and these clusters are individually culled based on occlusion and frustum. This allows groups of triangles, even within the same mesh, to be culled based on visibility. Cluster size and complexity is also changed based on screen size, eliminating the need for LODs since this process is essentially done per-cluster instead of per-mesh. This is also the reason that the lighting and material processing has to be so different, because the cluster system isn’t super compatible with traditional vertex methods. This is also why Nanite is designed for use in tandem with all next gen features- Lumen, Virtual Texturing, Virtual Shadow Maps, and Temporal Upscaling.
This is obviously a lot more complicated of a topic, as its a new method and gets very technical, but here is a fantastic breakdown of the system if you want to learn more.
As to why individually modeled geometry is cheaper on the system than masked materials- because of the way that Nanite determines clusters and visibility, it has a really hard time determining culling and LODing whenever there is overdraw, and overdraw happens more with translucent/masked materials than fully modeled meshes. Here is a great test that someone ran showing the overdraw differences. This is probably the weakest aspect of Nanite, since small meshes like leaves or blades of grass also cause lots of overdraw. In general, Nanite has no great solution for a scene with lots of foliage, but it is universally agreed upon that modeled geometry is at least better. Overall, just watch your overdraw, no matter what method you end up using.
Best of luck with your project. Let me know if you have any other questions!