I’ve found another resource regarding this, from the great article here:
In contrast to classic mips, the virtual texturing system only streams in parts of the textures that UE requires for it to be visible. It does this by splitting all mip levels into tiles of a small, fixed size. I believe it’s 128x128 buckets. The GPU determines which of the visible tiles are needed for all the visible pixels on the screen. This means that when UE considers an object to be visible, it’s communicated to the GPU, which loads the required tiles into a GPU memory cache. No matter the size of the texture, the fixed tile size of the SVTs only considers the ones that are visible. So it’s much more granular and it’s far better at only requesting texels that contribute to the final image. Meaning … with classic mips if even a small part of the object is visible, the entire object is considered visible and UE will make sure the whole texture for the object is loaded. SVTs are kind of like Nanite for textures. So if 90% of an object is hidden behind a tree it will only load the texture tiles needed for the 10% of the object that is visible.
Another benefit here is not restricted to picking a single mip level for an object. Say you have a texture that you’re viewing nearly edge-on, like a road texture on the ground or a texture for a wall going away from camera. Parts of that texture you see at very high magnification (the parts close to you) will need the full resolution, but as you approach the vanishing point you need smaller and smaller mips of the texture. VTs can request high-res tiles for the areas close to you and low-res tiles for the distant ones. So you can potentially save yourself quite a bit of memory here!
So why isn’t this default in the engine? Well, it probably will be sometime in the near future. But there are a couple of potential pitfalls with SVTs that need to be addressed. First is that there is a performance cost associated with them. When we use virtual textures there is increased GPU and CPU time to process the requests of what tiles to stream. It not huge but its not nothing either. And it goes up for every VT you are rendering. They also have their own memory pool and their own VT-specific cache that sits between the DDC and the editor that both might need to be managed. So there is a “VT Tax” here. That being said I don’t think there is much reason to completely avoid them. In fact, if you want to use udims in your project they need to be VTs. Also be aware that VTs have their own memory pool that will probably need to be adjusted.