CHAOS_SCENE_LOCK_TYPE and hitching in cooked builds

Hello,

We have a need to cast a LOT of rays and sweeps (e.g. SweepMultiByChannel). Often, upon discovering the results of one cast or sweep, we must do more casts and sweeps, or other work, which suggests that a round-trip thru Unreal’s built-in async ray casting system isn’t appropriate for this use case. The UE::Tasks::Launch system offers a fantastic way to perform our work on the background worker threads, with inter-task dependencies and everything. Chaos seems to handle ray casts from not-the-main-thread safely. It takes many frames (a handful of seconds, even) for all the rays to be cast, and for the results to be collated. We’re OK with this delay, because it’s running in the background and (mostly, hence the forthcoming question…) not affecting the frame rate, and the game design tolerates this. Since we’re casting so many rays, we’re making a lot of read locks on the scene throughout the frame. Of course, various physics updates have to make write-locks. We don’t want our readers to delay those write locks.

I’m curious about this code here:

```

/** Controls the scene lock type. See above. */

#if WITH_EDITOR

#ifndef CHAOS_SCENE_LOCK_TYPE

#define CHAOS_SCENE_LOCK_TYPE CHAOS_SCENE_LOCK_RWFIFO_CRITICALSECTION

#endif

#else

#ifndef CHAOS_SCENE_LOCK_TYPE

#define CHAOS_SCENE_LOCK_TYPE CHAOS_SCENE_LOCK_FRWLOCK

#endif

#endif

```

In the editor, this implies the use of TRwFifoLock, all is well - a write lock will suffer at most the cost of a very long sweep (a handful of msec, we can probably optimize that with a larger number of shorter sweeps).

In cooked builds, this implies the use of FWindowsRWLock which apparently allows new read lock requests to succeed, even after a write lock has been requested. This is bad news for us, because there will *always* be new write locks being made, especially on computers with many many cores. This appears to cause hitching, because those same write locks take a long time (tens of msec) to be satisfied. Indeed, https://learn.microsoft.com/en\-us/windows/win32/Sync/slim\-reader\-writer\-\-srw\-\-locks includes the text “SRW locks are neither fair nor FIFO”.

Two questions:

1. Why is the lock type different between editor and non-editor builds?

2. If we change the code above to use CHAOS_SCENE_LOCK_RWFIFO_CRITICALSECTION in all configurations, how will we suffer?

3. Got any other advice about casting a lot of rays from worker threads while preserving main thread performance?

Thanks!

-Sergey

Hi Sergey, and thank you for your question.

I’ve taken a look through the history, and the editor specific define was added to address a thread local storage issue around 2 years ago. I’m unsure why it was done this way so I’ll reach out and see what the thinking was.

To clarify the specific concern you have, it sounds like the concern is that you are going to have a lot of read locks (from the scene queries you are running), and you are concerned that the unfair locking may be prioritizing the read requests at the expense of the write lock requests (meaning the write never gets acquired and hence stalls the physics update for the scene query acceleration structure). Is that all correct?

All the best

Geoff Stacey

Developer Relations

EPIC Games

That all makes sense. I’ve reached out to the dev side to get a bit more context and I’ll get back to you one I have more!

Geoff

Hi Sergey,

The history of that change is that we ran out of the number of allowed Thread Local storage’s allowed at once (which can happen with lots of worlds open at times during development). We prefer the unfair locks normally due to better throughput, but we had to revert to a fair lock in that case because of that TLS.

For your scenario - you could alter it to a fair lock at the expense of some throughput (I’d suggest you profile before and after to get an idea of that number), or you could potentially look at timeslicing/yielding to give the write lock a chance to succeed.

All the best

Geoff

Sounds good, let us know if you have any issues!

Geoff

> you are going to have a lot of read locks (from the scene queries you are running), and you are concerned that the unfair locking may be prioritizing the read requests at the expense of the write lock requests (meaning the write never gets acquired and hence stalls the physics update for the scene query acceleration structure). Is that all correct?

Yes, that’s exactly the pattern we’ve observed with Insights. We did this by instrumenting the ReadLock and WriteLock functions of TRwFifoLock (apparently the good one) and FPhysicsRwLock (the one we suffer with) with TRACE_CPUPROFILER_EVENT_SCOPE_STR to show the locking delay duration in Insights.

Just to recite our specific observations again:

With CHAOS_SCENE_LOCK_FRWLOCK:

  1. A dozen background worker threads are all calling SweepMultiByChannel
  2. Each of those establishes a read lock
  3. Once a single read lock is established, subsequent read locks succeed almost immediately
  4. There’s always more read locks being requested
  5. A write lock (with CHAOS_SCENE_LOCK_FRWLOCK) must wait for all read locks to finish reading
  6. This never comes because we have many seconds of work, and there’s a *lot* of read locks
    1. One worker of 12 releases its read lock, but the rest of them still hold a read lock
    2. That one worker is then able to immediately acquire the next read lock, before the others release theirs
  7. Hang until all the read locks are done - this takes several seconds

You can see that in contrast, with CHAOS_SCENE_LOCK_RWFIFO_CRITICALSECTION, the following happens:

  1. A dozen background worker threads are all calling SweepMultiByChannel
  2. Each of those establishes a read lock
  3. Once a single read lock is established, subsequent read locks succeed almost immediately
  4. There’s always more read locks being requested
  5. A write lock (with CHAOS_SCENE_LOCK_RWFIFO_CRITICALSECTION) request blocks further read lock acquisition
    1. This is the difference from the other version
  6. All currently-in-progress reads finish, new reads are blocked from starting
  7. The writing thread has to wait at most the length of the longest read lock
  8. Write lock acquired in relatively short order, current frame is not blocked
  9. Once the write lock is released, subsequent read locks begin again.

We’re wondering why TRwFifoLock isn’t the default behavior in all configurations, because it’s certainly mitigating a lot of our problems. So with that additional context (or perhaps a more clear re-statement of your re-statement of our initial statement, lol), the 3 questions remain:

1. Why is the lock type different between editor and non-editor builds?

2. If we change the code above to use CHAOS_SCENE_LOCK_RWFIFO_CRITICALSECTION in all configurations, how will we suffer?

3. Got any other advice about casting a lot of rays from worker threads while preserving main thread performance?

One more additional detail is that our target platforms are XSX and PS5 and PC (with some minimum core count that approximately matches the consoles).

Thanks again!

-S

Sure, ok. We’ve been happily living the FIFO life in all configurations for the past 3 weeks, and according to our profiles it’s a net benefit to us.

Thanks!

-S