I’ve been working on my very own Voxel-based game for a while now. Moved from an internal engine (hand-written from the ground up aroung OpenGL 4.0, by myself) but moved to UE4, since 4.8 added the new ProceduralMeshComponent nad it’s awesome! Super easy to get something on screen! Managed to get some noise in and got this (Google Drive).
I used OnConstruction event from Actor class to generate the noise and build the mesh
A Cell contained 64x64x64 Voxels and there were 4 Cells!! That’s 1,048,576 calls to GenerateNoise! This was taking ~30-45 seconds to complete!
Now the real problem, I re-coded this whole thing, and started to use the Async<T> function from 4.8 (with ThreadPool) and I’m building a 3D Grid of 4x4x4 Cells, sized at 32x32x32. Moved the genereation code to OnBeginPlay, but the editor still hang for ~2-3 seconds to build the Cells. And I’m not even generating noise! Why is this happening?
If you have suggestions or ideas regarding my problem or you need explanation, don’t hesitate to post!
P.S. : if you want to view the code, I uploaded it to GitHub : link’s here.
Edit : new repo is on BitBucket : here, the GitHub repo won’t be updated anymore (might be deleted in the future).
My understanding of how Minecraft was implemented is that it does some major optimisations by not generating voxels that have neighbours on all six sides until one of the neighbouring voxels is removed. The voxel exists in concept and has a material and position in the tree, but none of the other work around it is done until it’s needed. You can see this especially good when the game lags and a chunk doesn’t load - black space where solid ground exists, but you’ll see pockets of lava hanging in space for miles.
Along those lines, are you using an octree to optimise it further? Just having static arrays of voxels is going to be way slower than using a tree to progressively yet quickly decide which cells to remove. Octrees scale well too. Here’s an overview in case you haven’t seen it before: Octree - Wikipedia
I bet UE has a kickass octree implementation that you can just extend. I would both use an octree to optimise voxel instantiation and a tree for voxel list chunking. Store them in an octree for quick access and then create a duplicate octree with total solid voxel count for quick elimination of large chunks.
These are basically all the same problems you face with landscape CLODs except in three dimensions rather than two. Which straight up is a challenging but well-trodden path.
From all this it’s pretty clear that UE badly needs an actively developed open source voxel library. You should definitely keep working on it.
Edit: looks like a TMap is a tree and is probably an octree, so that’s step one done. Now figure out how you can use it to optimise everything.
I thought about using an Octree but my brain hurst when thinking about this.
Is a Cell equivalent to a Leaf node?
How do I manage nearly-infinite dimensions on all 3 axis?
How do I make it thread-safe (like not reading while writing and not writing on a Cell when it is locked)?
And last, even if I know I’m not yet in the optimisation stage, but how to make it blazing fast?
I’ve been searching on Google and in GitHub for possible implementations for the last month and couldn’t find an example with thread-safety. If you could point me in the right direction I’ll be happy!!!
And can someone help me about those threads? I can’t find any doc about locking mechanisms in Unreal…
I don’t think about octrees in their hierarchical state much either. The key thing to remember is they’re cubic regions delimited by eight points, hence octree. Check the picture on that wikipedia article, it shows a cube illustration kind of demonstrating an octree search.
A cell is probably equivalent to a leaf node, yes. It will contain an array of pointers to your actual voxel type structs.
Infinity is a big concept. You manage huge dimensions by garbage collection and writing to disk. Minecraft has a massive potential world size but it doesn’t load it all at once. Minecraft also benefits by not writing unmodified regions to disk explicitly - it just regenerates them as needed using the world’s seed. They don’t get saved until you do something to them, and even then you could just save a diff to disk. The limitation is only a factor of your coordinate system, which is going to allow distances so large you shouldn’t worry about it. Even then, the octree naturally provides a segmented coordinate system to overcome that limitation.
You can speed all this up by using an immutable set of chunk holders that just exchange chunks and recycle themselves rather than being destroyed. I read an article about this the other day that suggested real performance gains can be had by having a constant number of objects that are recycled.
Just design to avoid thread overlap. When you fire off a thread, chances are you’re going to be using it to navigate the octree to convert a world coordinate into a voxel ID. Do something like have threads that arrive at a particular voxel enter a queue that can quickly check if the voxel was destroyed and end the queue if necessary. On the other hand, once you know you’re in a cell and not pointing at a neighbouring cell, you can revert back to raycasting and other things that UE already provides and optimises. You don’t need to use the octree for every tiny action.
It’ll be blazing fast just because it’s an octree. See below about the memory/generation time tradeoff.
Things to know about octrees:
There’s a cost to generating the initial tree (that you can then cache to disk so it never has to happen again until it’s invalidated).
There’s a cost to any changes you make.
The octree itself ensures that any changes happen more quickly than just walking an array.
The larger your data set, the better the octree performs as compared to walking the array.
Memory usage does go up fast though.
The only tradeoff to consider is memory usage versus array search time in the leaf node.
If you know quadtrees, they’re exactly like a 3d version of that. You begin with one base region that can be assumed to contain everything. You then divide it into eight equal regions that each contain a list of everything inside them. Each region repeats until you’ve attained the resolution you care about - the remaining nodes can be walked like a normal array.
So if you have a cube of 256[SUP]3[/SUP] voxels, you’d need 8 iterations to get each chunk down to 256 items. At a guess that’s when memory management starts becoming a thing and is probably where the actual work lies. You could disregard memory for now and just work on a smaller total number of voxels just to get the idea working, then improve it. But again, I bet UE has something you can ride off of as its physics system almost certainly uses an octree to optimise collision testing.
One of the biggest constraints on minecraft is in the vertical axis - you can only go 128 blocks up or down I think. Mods that raise this limit require lots of memory on the host AND the client. This limit is worth considering.
Also probably worth remembering, don’t store your voxels individually. Have a subset of types and use pointers to reference them. This should minimise your memory usage pretty good.
Also do vertex welding and several LODs on your generated geometry (although I don’t know as much about this as I’d like). Having a look at how CLODs weld polygons and reduce triangles in the distance. Your problem here is even simpler, you just want to combine chunks that aren’t at any risk of being modified, such as chunks with no players in them.
So this is just an overview of how I believe it works. You shouldn’t re-invent the wheel though - go out there and find an implementation that does exactly what you want and study it until it makes sense, then refactor it into UE. I would be surprised if there isn’t something already in UE that you can make use of. Eg: https://docs.unrealengine.com/latest/INT/API/Runtime/Engine/TOctree/index.html
Edit: actually answered your questions.
Edit: changing my answers.
At the time I begun coding, I didn’t know it existed! I’ll be using that in the future.
As for the TOctree class, I’m blown away by the lack of any sense that’s making for me! The class FOctreeElementId has no way of pointing to a tree (that I found).
How do I search in the tree? Like if I want to get all elements in the region delimited by 2 points (X, Y, Z), how would I do that?
Yes, I’ve looked in the NavigationOctree and PrimitiveSceneOctree but I can’t make any sense of it all!
Is using something like this viable?
IIRC it was because at the time FIntVector didn’t work with blueprint, or wasn’t editable in property sheets. That may have been fixed by now.
I’m afraid I’m responsible for TOctree as well. You can use TOctree::TConstElementBoxIterator to iterate over elements that intersect some bounding box.
However, I don’t think you should use an octree to store your voxels. In BrickGame, I just store them in a flat array for each region, with regions sparsely accessed through a TMap. This is pretty inefficient in terms of disk usage, but IMO is more suited to modern PCs. It packs the voxels into contiguous memory, which uses memory bandwidth and cache space more efficiently than a sparse data structures. Computers can crunch a lot of numbers, so clever tricks to avoid crunching numbers can easily hurt more than they help.
A concrete example of this is that in the latest BrickGame code, I use parallel tasks to generate the terrain. Since I store the voxels in a flat array with all voxels at the same XY stored contiguously, each task can independently generate the voxels for a different XY, and there are no locks or complicated data structures to synchronize access to. The cache lines should even match up pretty well to the tasks.
Using a TMap to store individual voxels allows you to avoid storing any data for empty voxels, but at great cost. It adds at least 8 bytes of overhead to each voxel, and probably more like 23 bytes of overhead (hash table pointer + per-element pointer + per-element pointer alignment). All that overhead for 2 bytes of data! Just using a dense array of voxels will be smaller in many cases, and much faster to read and write.
Didn’t think that TMap would THAT much overhead! Thanks for the tip!
I’ll give the dense array a try tonight when I get home.
Could you post a simple (as in init+term of the tasks) example code of your parallel tasks? My knowledge of multi-threading is quite limited and could you some help (read the wiki articles by Rama-sama but I didn’t quite understood).
Yeah, I’d say the learning curve here is steep, but you can do it! Actually, I probably launched into the hardcore explanation too soon and I feel like my first reply wasn’t very good, so let’s cover off some basics first. Watch this tutorial: Quadtree Explanation - YouTube - this is about quadtrees, which is a 2d version of an octree.
Let’s ignore voxels for now, the important thing to know first is that everything with a physical position lives in the octree. Objects and stuff will know their own location, but at the start of your level you’ve queried all of objects in space and copied their pointers into the octree, indexing them. **Their location in space determines where they go in the octree.
**So if you wanted to, say, get all of the irregular elements in a particular region. You might have billions of objects in space and you can’t just test the range of each one every single frame. Instead you index them into an octree at the start of the game, then when you need to test nearby objects, do this:
Set a cubic search region with the player at its center. It can be any size or proportion, but it must be regular, all corners 90 degrees, etc.
For each bounding point/corner of the search region, work out what octree node the bounding point lies in by searching down through the octree’s branches starting with the biggest and ending at a leaf. This will get you the chunks you need to iterate over. Your octree might have tens of thousands of leaf nodes, so you use the branches to narrow it down to maybe 6-8 tests. Remember, each node in the tree is a bounding box itself, which contains smaller bounding boxes. This goes a bit like this:
“I have sections A, B, C and D. I’m going to test all of them. My point lies in 1,1,3, which falls inside the bounds of B.”
“Now I have sections B.A, B.B, B.C and B.D. Upon further testing I’m actually in B.C.”
This carries on until you have B.C.A.B, with each node becoming more precise. That’s 4 iterations, rather than the 1024 iterations required to brute-force test every single node in an octree that goes 4 iterations deep.
Anyway, you’ll probably end up with 1, 2, 4 or more leaf nodes to search, depending on your region size. Any leaf nodes that reside TOTALLY inside your region are considered “found” - all the elements inside those nodes are in your search region. But we still need to handle partially intersected nodes.
Loop over the remaining leaf nodes you’ve partially intersected and test the position of each element inside that node to see if it’s inside your search region’s bounds. The more levels in your octree, the smaller this eventual region is, so the faster it is to search.
Take your resulting element list and do other things to it, like perform even more searching in other ways, or load them into an array for consideration during gameplay events. You can do things like when the player’s position crosses a chunk border, dump the list and build it again starting from step 1.
So that’s the example for finding nearby items that just randomly float around the world with varying precision.
Now let’s set the context to voxels. The reason you’re using an octree is NOT to make searching for voxels fast - since you’re already using a regular grid, the voxel selection part is simply a 3d range of grid points. The reason you use an octree for voxels is to provide chunking for threading and fast culling, as well as helping you decide when to do garbage collection and eliminate some chunks from memory. Ultimately your octree will help you quickly find out chunk info like “am I completely empty or completely solid” or “do I have any modifications”.
So to break it down further:
If you want to simply get all the voxels in a known range (say you’re going to spawn an explosion and you want to know what voxels to test for destruction) use the octree to get a manageably small set of voxels and just refer to your new voxel pointer array and do some modulus. Google modulus if that’s a new term, it’s used to figure out which units in an array are next to each other if they’re reconfigured into a grid.
If you want to divide voxel labour into threads, use the octree.
If you want to figure out which chunks to load, octree
If you want to find all of the monsters walking around a particular region, use the octree.
You’d still need to do all the minecraft-style optimisations we talked about above such as culling completely surrounded voxels and not drawing geometry you know that the player can never see, but those operations take time and you only want to do them on certain blocks anyway, so you can use the octree to prepare the answers for a lot of different questions that you’ll ask during gameplay.
When it comes to updating your octree, do that by chunks as well. Use the octree to determine what’s “active” and then let those objects use the octree to figure out what their chunk bounds are. Then they can update their location in the octree if they leave that chunk. Every so often you re-evaluate what’s active and update that list.
It sounds like you’re our resident expert. Question - as the grid size increases, doesn’t it become helpful again to index those dense arrays into a tree? It’s super fast to say thread a GPU to check a thousand voxels, but in large world scenarios you’d have to see some immediate benefit.
Depends on what you’re passing. If you can instance the smaller actors and use the GPU to draw the same geometry in lots of different locations, that’s super fast. If all your geometry is different, it’s better for the geometry to be in one large instruction rather than thousands of tiny calls.
Holy Cow, that’s a lot to chew! I’ve read Antidamage’s posts like 5 times and I’m not sure I understood all (at work, reading from my cell phone). Gonna read all of it again and ask more specific questions when I get home tonight.
Every time someone solves all these problems, they more or less end up with the exact same solution. This is a good example of that. It’s time someone just released a large-scale voxel implementation library and saved everyone the work.
Personally I’m using 5x5 vert heightmap-capped cubes (rolling hills are much friendlier for AI navigation) so a number of assumptions re: mesh and visibility that apply to uniform-hulled voxels have to be discarded. But in general I agree that, yeah, it would be nice to see a solid BSD/MIT-licensed Open* library for dealing with the common issues.
After all I’ve been through to get where I am now and all the work still laid in front of me, I’ll give it to this awesome community as a thank you! I’ll create a new thread when I’ve got something working.
In the mean time, quick question for the Awesome Antidamage : the leafs of the octree would be some king of chunk (dense array) of arbitrary size (say 16 cubed) or I misunderstood?