Stale FMassCommandBuffer::OwnerThreadId from aborted background priority tasks leads to false positive "Commands can be pushed only in the same thread where the command buffer was created." check.

There is a path of execution where FMassCommandBuffer::OwnerThreadId can no longer match the correct thread id when running the task for a FMassEntityQuery::ParallelForEachEntityChunk. This in turn leads to the 2nd check in the COMMAND_PUSHING_CHECK macro to fail incorrectly, since OwnerThreadId doesn’t match FPlatformTLS::GetCurrentThreadId().

checkf(OwnerThreadId == FPlatformTLS::GetCurrentThreadId(), TEXT(“Commands can be pushed only in the same thread where the command buffer was created.”))

  1. The task runs as normal for the given worker id for a first batch of jobs, ParallelForImpl::ParallelForInternal::FParallelForExecutor::operator() -> ParallelForImpl::CallBody -> FMassEntityQuery::ParallelForEachEntityChunk::lamba2 runs, calls the lazy init FTaskContext::GetCommandBuffer, which allocates the command buffer and creates it with the current thread id, for the FTaskContext associated with that worker id.
  2. The first batch for that task completes, and in FParallelExecutor::operator(), we’re a background priority task, and fall through to rescheduling the task. We launch a new task with the same worker id.
  3. The new (rescheduled) task is picked up by a different worker thread, which starts working another batch of jobs.
  4. We once again hit the lazy init FTaskContext::GetCommandBuffer, which uses the same FTaskContext as before (since it has the same worker index). We see that the command buffer already exists, so we don’t create a new one.
  5. We’re now in a state where we call the body (the lambda for our FMassEntityQuery::ParallelForEachEntityChunk call) with FMassCommandBuffer::OwnerThreadId not matching our current thread id (it matches the thread id from the task that ran the 1st batch). And commands pushed to the command buffer will fail the check. This results in the check failing, but we’re not actually performing unsafe concurrency, since the task should still be exclusively accessing the command buffer in this situation.

Thank you for bringing this to our attention! It is not something we had thought of happening or run into internally. Your description and breakdown of the problem is fantastic. It helped tremendously in getting a fix together. The fix is currently in review, but it is currently looking like it will not make it into a hotfix. I will keep tabs on the review to be able to share the CL once it is in a public stream.

-James

Great news! This change has been approved and is in CL 43905432 in UE5 Main stream. I can help track the GitHub commit as well in case you are using that.

Thanks for the quick turnaround! Our studio will probably pick this up when we do our next engine integration.

We are happy to help! And thank you once again for bringing this to our attention. Little things like this make the engine better for everyone. If you do bump into issue with this or it does not fully fix the problem you encountered, please let us know.