Hi, this is my first post in the UE4 forums, so let me know if I do something wrong in what regards the posting rules.
So, I have built an application outside UE4 that I am progressively bringing to UE4. One of its features is that it has a custom physics engine implemented entirely from scratch.
My question is the following. Do you know if is it possible to use OpenCL with UE4 in order to accelerate a custom physics engine? It means, I would drop Nvidia Physx and port my own engine from scratch. I ask that because it is my impression that UE4 does not allow GPU acceleration for its builtin components, but I am wondering if it is possible in case the GPU accelerated algorithms are passed without using UE4 builtin components like Physx.
I’ve previously used OpenCL on both an Intel CPU and an Nvidia GPU, and had some success. The issue we encountered was that there would be some error around available GPU resources (I can’t remember exactly - it was some months ago, sorry). My conclusion was that the competition between my constantly-churning OpenCL calculations and the totally-separate video rendering was too much for the card. This was on a GTX 970.
In terms of whether it’s possible if you can work around the above (or if it wasn’t an error for the reasons I thought, and that you maybe don’t encounter the same issue), it totally is. You’ll want to interact with your OpenCL drivers from a separate thread and communicate back to the main thread in a preferably non-blocking way. I used a circular buffer for this - if you always have more buffers than you have readers + writers, you can copy to several un-contested buffers at once and let the readers consume any of those while you write to new ones, and so on.
Let me know if you have any questions! It’s been a while but I can try to advise.
Hi @HateDread, many thanks for your insightful reply. You even touched on one of the things that have been precisely bugging me: the degree to which I can make rendering and calculations compete. As far as my short experience with computing in the GPU can tell, it seems to be better to just leave few and specific tasks for the GPU while it renders complex scenes. For instance, mine is a 3D scene with somewhat complex and medium to high-poly objects. At first (before using UE4) I ported all the repetitive algorithms of the physics engine to the GPU, but it was too much for the card. Now, when bringing all the set up to UE4, what I am willing to do is focusing on two tasks for the GPU.
First, I thought of leaving the broad phase collision detection for the CPU since that is fast enough anyway, and focusing only at the mesh-mesh narrow phase in the GPU since that is what needs the speed up and that’s only an eventual task. Second, I though of relying more heavily on the GPU for the geometry calculation I have to do for pre-processing, like 3d convex hull, some smoothing catmull clark applications, some polygon decomposition for navigation meshes, etc.
One of my doubts on the feasibility of that is the following. Since the UE4 API is not thread-safe, I’m unsure on whether those calculations - specially for narrow-phase collision detection - would behave appropriately since they would have to pull data from the main CPU thread and pass it back for the game main thread objects, since my understanding is that the main thread objects can’t be pulled out there not even for CPU multi-thread. Would you have something to tell me about that? It is, how much trouble did it give to you to retrieve/send information from/to game objects been handled in the CPU main thread?
I hope that only doing what you describe will be okay alongside the rendering tasks, but I worry that it is less about sharing x amount of resources, and more of a clash in terms of access (race conditions) and other quirks of trying to strong-arm your way into sharing the GPU. Maybe start small and slowly increase the compute load in a UE4 scene that is at a constant level of rendering load? (i.e. keep the rendering consistent for the sake of the test). You might have issues straight away, or you might be able to get up to a certain point… I’m not sure. I feel like our problems came out of the blue, but I was changing a lot of code very quickly, so it may have been my fault.
If we again ignore the above, there are going to be some issues with using the GPU in this way. Namely, it’s not like a conventional fork/join thread model, which is my preference. In that model, you would be waking threads, distributing work to them, then putting them to sleep when they’ve finished, whilst waiting on the main thread for the work to complete (probably whilst doing some work on it yourself). When the GPU is involved, you have no control, and you’re unlikely to get the compute results back to the CPU inside of a frame. It will be somewhat inconsistent, so you can’t lockstep your framerate to the compute rate - it may be calculating it 40 times a second, with incredibly varied intervals between results.
To counter this, I used the model I described in the previous post - a single thread to talk to the GPU, and a lockless buffering system in between it and the main thread. This way, you can access the results on Tick, and whatever you get will be the newest possible data, but you won’t accidentally stall the main thread (which would happen if you used a mutex or other locking system). I have not tried to pull any game objects out on to other threads, and I wouldn’t advise it. I welcome you to try so I can see how it works, however Your best bet whenever you’re trying to do that is to move computations on to other threads and use the wake/sleep (fork/join) model I described to ensure that results are available in the frame that you need, though this would all be on the CPU.